Introduction

The development of psycholinguistic databases which cover norms for affective language features has allowed a deeper investigation into the impact of emotional content of words on language processing. Word emotionality impacts different language-related processes, such as word recognition (Kuperman et al., 2014), acquisition of abstract words (Guasch & Ferré, 2021), and learning new words in a foreign language (Ferré et al., 2015; Frances et al., 2020). Language comprehension seems to be influenced by emotionality on a word level (Citron, 2012; Opitz & Degner, 2012; Scott et al., 2009), sentence level (Ding et al., 2020) as well as text level (Leon et al., 2010, see Hinojosa et al., 2020, for a recent overview). In the case of language production, it appears that word emotionality affects picture naming (Hinojosa et al., 2010; White et al., 2016) and word generation (Cato et al., 2004). However, the effect of word emotionality is not limited to language processing. It also affects other language-related cognitive processes, such as perception (Gendron et al., 2012; Lindquist et al., 2006), attention (Kanske & Kotz, 2011), learning (Shablack et al., 2020; Snefjella et al., 2020), and memory in native speakers (Ferré et al., 2018; Zimmerman & Kelley, 2010) and bilinguals (Ferré et al., 2010; Marian & Kaushanskaya, 2008).

Two main theoretical approaches to affective experiences and emotions (Tyng et al., 2017) reflect the characterisation of language emotionality. The first approach defines word emotionality through three main dimensions: valence (hedonic tone of stimuli), arousal (degree of activation or excitement associated with stimuli), and dominance (degree of control over stimuli) (Barrett & Russell, 1999; Bradley & Lang, 1999; Russell, 2003). This approach is known as the dimensional or circumplex model of emotions. Normative studies commonly include ratings for valence and arousal. On the other hand, dominance has been less studied, and its ratings are usually not collected in normative studies (Xu et al., 2021). The vast majority of research on valence has found that positive words are processed faster (Kousta et al., 2009; Vinson et al., 2014; Yap & Seow, 2014) and remembered better than neutral words (Ferré, 2003). In contrast, research on negative words has not yielded consistent results: while some studies report that negative words are processed faster than neutral (Kousta et al., 2009; Vinson et al., 2014; Yap & Seow, 2014), others describe the opposite pattern (Estes & Adelman, 2008; Kuperman et al., 2014; Larsen et al., 2006). The inconsistent results may be partly explained by an interaction with arousal (Larsen et al., 2008; Vieitez et al., 2021). The general assumption is that, due to the importance of emotional content for survival, additional attentional resources are involved in the processing of positive and negative words compared with neutral words (e.g., Kousta et al., 2009; Yap & Seow, 2014). It seems that additional linguistic features modulate the emotional effect, including frequency (Scott et al., 2009) and concreteness (Palazova et al., 2013), as well as non-linguistic factors such as participants' mood (Sereno et al., 2015). Finally, recent research has shown that the effect of valence on cognitive processing might be task-dependent (Crossfield & Damian, 2021). Arousal also seems to have a facilitative effect on language processing (Delaney-Busch et al., 2016). Still, this effect might be less clear than the effect of valence, as some authors report that calming words are processed faster than arousing words (Kuperman et al., 2014). As in the case of valence, the effect of arousal appears to be modulated by other linguistic features of words, such as concreteness (Yao et al., 2016), which might have contributed to inconsistent findings. Finally, arousal and valence may interact (for an overview see Hinojosa et al., 2020; Hofmann et al., 2009). Research has shown that low-arousal positive words and high-arousal negative words are processed faster and exhibit greater neural activation than high-arousal positive words and low-arousal negative words (Citron et al., 2014; Larsen et al., 2008, but see Delaney-Busch et al., 2016, and Kuperman et al., 2014, for non-interactive effects of valence and arousal).

The second perspective on human affective space (and word emotionality) comes from discrete emotion theories, pioneered by Ekman (1993). It started as further differentiation of positive and negative emotional valence. The discrete emotion approach assumes the existence of several biologically determined emotions, often called core emotions (Nesse, 1990). Different authors have proposed a different number of discrete emotions, e.g., seven (Fahad et al., 2021), eight (Harmon-Jones et al., 2016; Strauss & Allen, 2008), nine (Consedine & Fiori, 2009) or 12 (Martinent et al., 2012). Still, it seems that most authors agree on five basic emotions. These basic emotions can be identified by human facial and vocal expressions, and are more or less consistent across different age groups and cultures: happiness, sadness, anger, fear, and disgust (Briesemeister et al., 2011a). Although relatively understudied, discrete emotions seem to affect language processing. Briesemeister et al. (2011a) examined the effect of discrete emotions on language processing in a lexical decision task. The results indicate that happiness-related words are processed faster than words related to negative emotions or neutral words. Furthermore, disgust-related words appear to take more processing time than words related to anger and fear, or neutral words. Given that Briesemeister et al. (2011a) matched valence and arousal between words related to different negative emotions, these results suggest that the effect of discrete emotions on language processing cannot be attributed to valence or arousal. Studies using brain imaging techniques further emphasised the importance of taking both approaches into consideration (dimensional and discrete emotions) in explaining emotional word processing. For example, electroencephalography (EEG) studies have shown that happiness (as a discrete emotion) and positive valence (as an emotional dimension) influence different event-related brain potential (ERP) components (Briesemeister et al., 2014). More precisely, the effect of happiness on the N1 wave and the effect of positivity on the N400-like components have been observed. Functional magnetic resonance (fMRI) studies suggest that different brain regions might be involved in the processing of happiness vs. positivity (Briesemeister et al., 2015).

Most of the existing affective norms cover word emotionality from the dimensional perspective (e.g., Chinese: Xu et al., 2021; Croatian: Ćoso et al., 2019; Dutch: Moors et al., 2013; English: Bradley & Lang, 1999; Finnish: Söderholm et al., 2013; French: Monnier & Syssau, 2014; German: Citron et al., 2020; Kanske & Kotz, 2010; Italian: Montefinese et al., 2014; Polish: Imbir, 2015; Portuguese: Soares et al., 2012; Spanish: Guasch et al., 2016; Stadthagen-González et al., 2017). On the other hand, discrete emotion norms are currently available for only five languages: English (Stevenson et al., 2007), German (Briesemeister et al., 2011b), Spanish (Ferré et al., 2012, 2017; Hinojosa et al., 2016; Stadthagen-González et al., 2018), French (Syssau et al., 2021), and Turkish (Kapucu et al., 2021). The first database of discrete emotion norms used English words from ANEW (Bradley & Lang, 1999), the seminal study in which ratings for emotional dimensions were collected. It consists of 1034 words evaluated on five basic discrete emotions (Stevenson et al., 2007). The second study on discrete emotions for English was conducted by Strauss and Allen (2008). It includes 462 words on five basic discrete emotions, with the addition of surprise and anxiety. The Discrete Emotion Norms for Nouns–Berlin Affective Word List, DENN-BAWL (Briesemeister et al., 2011b), appeared shortly after, providing discrete emotion norms for 2000 German nouns. Probably the most comprehensive database is the one developed by Stadthagen-González et al. (2018) for Spanish, and it includes norms for 10,491 Spanish words for five basic emotions. Discrete emotion ratings for Spanish are also available in several smaller studies (Ferré et al., 2017; Hinojosa et al., 2016). Normative data for discrete emotions are available for 524 French personality traits (Ric et al., 2013) and 1031 French words (Syssau et al., 2021), while the most recent database, FANCat, brings norms for 10 discrete emotions: fear, anger, disgust, sadness, anxiety, awe, excitement, contentment, amusement, and serenity. Finally, norms for five discrete emotions are available for a set of 2031 Turkish words (Kapucu et al., 2021). To our knowledge, there are no published normative studies on discrete emotions for any of the Slavic languages.

The main goal of this study is to broaden the existing psycholinguistic database of affective norms in Croatian (Ćoso et al., 2019) with norms for five discrete emotion categories: happiness, anger, sadness, fear, and disgust. The existing affective dataset (Ćoso et al., 2019) includes 3022 words rated on two emotional dimensions, valence and arousal, as well as concreteness (Ćoso et al., 2019). The CROWD-5e database brings ratings for five discrete emotions for the same 3022 Croatian words. Previous research has shown correlations between discrete emotion ratings and several lexico-semantic and affective features of words (e.g., Ferré et al., 2017; Kapucu et al., 2021; Syssau et al., 2021). Therefore, the second goal of this study is to investigate the relationship between discrete emotions and other linguistic properties (word frequency, length, subjective frequency, imageability, concreteness), as well as emotional dimensions (valence and arousal). Finally, sex effects are examined based on prior evidence on gender differences in both discrete emotions (Ferré et al., 2017; Hinojosa et al., 2016) and affective dimension ratings (Monnier & Syssau, 2014; Montefinese et al., 2014; Soares et al., 2012). Norms for five discrete emotion categories collected for the CROWD-5e database will allow psycholinguistic research in Croatian, including investigation into bilingual language processing, and additional cross-cultural comparison and understanding of word-level affective experiences and emotions.

Method

Participants

Word ratings were collected from 1239 Croatian native speakers, 597 of which were female (48.18%) and 642 were male (51.82%). The participants’ mean age was 22 years (M = 22.05, SD = 6.33). Their participation was completely voluntary. Most of the participants were recruited from the scientific community, with the help of colleagues, scientists, student unions, and social networks. A smaller part came from the authors’ private networks. The participants were either contacted directly or recruited using the snowball method (being nominated by someone who participated in the study). The ratings were collected between October 2019 and November 2021.

Materials and procedure

The norms were collected for 3022 Croatian words. The selected set of words was taken from the study by Ćoso et al. (2019), the only Croatian normative study with valence and arousal norms. Therefore, discrete emotion norms for the same set of words will allow further examination of the relationship between ratings for dimensional and discrete emotions. In their study, Ćoso et al. (2019) included Croatian translation equivalents of Spanish words, previously published in two databases: 875 words were taken from the Madrid Affective Database for Spanish (Hinojosa et al., 2016), while an additional 2266 words were taken from Ferré et al. (2017). Spanish words were directly translated into Croatian by two independent experts, Croatian–Spanish bilinguals. In cases when their translations did not match, Croatian-Spanish pairs were compared with English-Spanish equivalents, provided in the Spanish databases. Finally, two Croatian native speakers with master’s degrees in Croatian language and literature checked the list of translations. A total of 119 original Spanish words were excluded from the study, since the experts did not reach agreement in finding adequate Croatian translations. All words included in CROWD-5e were previously rated on valence and arousal in both Spanish and Croatian (Ferré et al., 2012; Guasch et al., 2016; Hinojosa et al., 2016; Redondo et al., 2007; Stadthagen-González et al., 2017), as well as on five discrete emotions in Spanish (Ferré et al., 2017; Hinojosa et al., 2016).

The 3022-word set was randomly divided into 12 questionnaires for each variable—happiness, sadness, disgust, fear, and anger—with 10 questionnaires consisting of 252 words and two consisting of 251 words. In total, 60 questionnaires were created. Word order within each questionnaire was randomised for each participant. The participants were first given a statement with a brief description of the purpose of the study and information about data confidentiality. They were informed that each questionnaire would take approximately 10–15 minutes to complete. They were advised not to overthink their responses, as there were no right or wrong answers. No time limit was set for the rating process, so reaction times were not collected.

The questionnaires were divided into two parts. The first part consisted of a page with a series of demographic questions (age, sex, education level, and university they attended), and a page with detailed instructions on how to rate words. The second part consisted of the selected words, distributed over 10 pages (nine with 25 words and one with either 26 or 27 words). The instructionsFootnote 1 were similar to those used in previous studies that collected ratings for discrete emotions (e.g., Ferré et al., 2017; Hinojosa et al., 2016):

You will be presented with a list of words. Your task is to rate them according to the emotional category of [fear, anger, happiness, sadness or disgust]. Each word should be rated on a scale from 1 to 5, with 1 being “not at all” (the word does not provoke [fear, anger, happiness, sadness or disgust]) and 5 being “extremely” (the word provokes [fear, anger, happiness, sadness or disgust]).”

The Croatian version of the instructions is available in the Appendix. The words were rated on a 1–5 Likert-type scale, where 1 indicated “not at all” and 5 indicated “extremely” (see Stadthagen-González et al., 2017, for a similar procedure). In addition, participants could mark the “I am not familiar with the word” option in all versions of the questionnaire and for all variables in case they did not know the word. Participants had to rate all the words displayed on a page before they could move on to the following page. Each page included a short reminder of the instructions. A free PHP (hypertext preprocessor)-based application, TestMaker (Haro, 2012), was used to distribute the questionnaires and collect the responses. The questionnaires were distributed online in the form of a URL link. Each participant could complete a maximum of two questionnaires with different word lists and for different discrete emotions. Each questionnaire was rated by approximately 20 participants (for further details see the paragraph “Data trimming”). The participants finished the task in 12 minutes on average.

Results

Data trimming

A total of 1420 responses were collected from 60 questionnaires, 181 (12.75%) of which were excluded from subsequent analysis. First, 48 responses (3.38%) were eliminated, due to one or more of the following reasons: (1) more than 15% “I do not know the word” answers i.e., 37 out of 251 words were unknown (N = 4); (2) all words were rated the same (automatic-like responses) (N = 35); (3) standard deviation (SD) for a participant’s ratings was below 2 SD with respect to the mean SD for all 1420 responses (N = 44). An additional 133 answers were removed because the correlation with other participants’ ratings was lower than 0.1 (following a similar criterion to Brysbaert et al., 2014), indicating that these participants might have misunderstood the task (N = 133). The remaining 1239 responses were adequate for analysis. The average number of observations per word (across variables) was 20.65 (SD = 1.61; range = [20–31]).

Reliability and validity of the norms

Reliability was examined using the Spearman-Brown correction on the outcome of a split-half procedure. Fifty repetitions of the procedure were carried out on each of the 60 questionnaires. This was done using the multicon package (Sherman, 2015) in R (R Core Team, 2021). The mean split-half reliabilities of the 12 questionnaires for each variable were as follows: r = .93 for sadness (ranging from r = .91 to r = .95); r = .93 for happiness (ranging from r = .89 to r = .95); r = .92 for anger (ranging from r = .82 to r = .95); r = .91 for fear (ranging from r = .82 to r = .95); and r = .89 for disgust (ranging from r = .82 to r = .93).

A common approach for examining validity is to compare ratings for some words with ratings from previously published databases. Since a previous Croatian database for discrete emotions does not exist, the ratings were compared with those collected for other languages: Spanish (Ferré et al., 2017; Hinojosa et al., 2016), English (Stevenson et al., 2007), and French (Syssau et al., 2021). To compare the ratings with the Spanish data, the words from Hinojosa et al. (2016) and Ferré et al. (2017) were collapsed in a single dataset because the words from the two databases do not overlap. A total of 3018 words from the merged dataset overlapped with the words from CROWD-5e. The Pearson correlations were as follows: r = .81 for happiness, anger, and sadness; r = .77 for fear; and r = .73 for disgust. On the other hand, 1083 words from CROWD-5e overlapped with the English database (Stevenson et al., 2007). The Pearson correlations for the English database were as follows: r = .84 for happiness; r = .83 for anger; r = .82 for sadness; r = .79 for fear; and r = .75 for disgust. Finally, 579 words overlapped with the French database (Syssau et al., 2021). The Pearson correlations were as follows: r = .79 for anger; r = .74 for sadness and fear; and r = .71 for disgust. The ratings for happiness could not be obtained from the French database because it was split into five positive emotions in that study. Thus, even though validity was not examined by comparing current ratings with those from other Croatian datasets, high correlations with ratings from other languages indicate the high validity of our ratings.

Descriptive statistics

The complete CROWD-5e dataset with 3022 Croatian words is available in an Excel spreadsheet and can be accessed through the webpage https://doi.org/10.6084/m9.figshare.19221678.v9. Croatian words (column: riječ) are alphabetically ordered, together with their English equivalents (column: word). The first sheet contains information about the variables, the second sheet contains the complete data, while the third and fourth sheets contain sex-specific data. Discrete emotions are presented in the following order: happiness, anger, sadness, fear, and disgust. Mean value, SD, number of observations, and proportion of unknown words are available for each discrete emotion.

Descriptive statistics were calculated for all 3022 words. Table 1 presents descriptive statistics for all words and all discrete emotions, as well as the word with the highest rating in each discrete emotion category. It also shows values for affective dimensions and other psycholinguistic properties of words, collected from other normative studies (Ćoso et al., 2019; Peti-Stantić et al., 2021) and the Croatian web corpus, hrWaC (Ljubešić & Klubička, 2016). Figure 1 (diagonal) shows word distribution in the dataset for all discrete emotions.

Table 1 Descriptive statistics and examples of words with highest ratings for each discrete emotion
Fig. 1
figure 1

Correlation and scatterplot matrix (with linear regression lines) between discrete emotions, and between discrete emotions and valence and arousal. Diagonal shows the distribution of ratings for all words in each discrete emotion category as well as valence and arousal.

The highest average ratings were collected for happiness (M = 2.30, SD = .77). All words had similar average ratings for negative emotions, ranging between 1.51 (disgust, SD = .52) and 1.58 (sadness, SD = .66). All discrete emotion ratings had a negative skew (Fig. 1). In the case of negative emotions, the ratings ranged between 1 and 2. Happiness, on the other hand, had a more balanced distribution, with mean ratings between 1 and 3. Still, the answers were distributed across the entire five-point scale for all discrete emotions, which can be seen from the minimum and maximum values in Table 1. The negatively skewed evaluations for discrete emotions are consistent with the Spanish data (Ferré et al., 2017; Stadthagen-González et al., 2018).

For each of the five discrete emotions, we further examined whether there were differences between nouns belonging to different noun categories. The nouns were classified by three independent evaluators, all of whom were philologists. The classification process was based on the Croatian noun categorisation system (e.g., Barić et al., 2005; Hudeček & Mihaljević, 2019), which defines nouns as words that denote beings (people, animals, plants; e.g., fotografkinja, photographer (fem.); konj, horse; limun, lemon), things (objects, places; e.g., bicikl, bicycle; bolnica, hospital) and occurrences (actions, emotions, concepts; e.g., osvajanje, conquest; frustracija, frustration; društvo, society). Table 2 shows the number of nouns per category (Beings, Things, Occurrences), as well as mean values and SDs (in parentheses) for each discrete emotion.

Table 2 Descriptive statistics for three noun categories

Several unifactorial analyses of variance (ANOVAs) were conducted to compare the mean ratings for each emotion between categories. Significant differences were then analysed through Bonferroni-corrected post hoc comparisons. Significant effects were found in all negative emotions: anger F(2, 2112) = 80.59, MSE = 0.32, p < .001; sadness F(2, 2112) = 94.22, MSE = 0.37, p < .001; fear F(2, 2112) = 65.55, MSE = 0.27, p < .001; and disgust F(2, 2112) = 34.96, MSE = 0.26, p < .001. Post hoc analyses showed that Occurrences were rated higher on anger, sadness, and fear compared with Beings and Things (all p < .001). On the other hand, Beings elicited higher ratings on anger, sadness, and fear than Things (all p < .001; except p = .002 for the comparison between Beings and Things in the sadness category). Finally, post hoc tests showed no significant differences in disgust ratings between Beings and Occurrences (p = .113), but nouns from the Things category were rated lower on disgust when compared with Beings and Occurrences (p < .001). No significant differences between noun categories were observed in the happiness category, F(2, 2112) = .405, MSE = 0.55, p = .667.

Relationship between discrete emotions, affective dimensions, and psycholinguistic variables

Pearson correlations were calculated to analyse the pattern of correlations between the five discrete emotions (see Fig. 1). The results showed a significant negative correlation between happiness and all negative discrete emotions. The highest negative correlation was with anger (r = −.64, p < .001), followed by sadness and disgust (r = −.61, p < .001), and fear (r = −.59, p < .001). In addition, significant positive correlations were found between all negative discrete emotions, with the highest correlations between anger and sadness (r = .82, p < .001) and fear and sadness (r = −.82, p < .001).

Pearson correlations between discrete emotions and emotional dimensions were also calculated. Values for valence and arousal were taken from the Croatian psycholinguistic database (Ćoso et al., 2019), which contains the same 3022 words as the present study. Valence positively correlated with happiness (r = .87, p < .001), and negatively with the four negative discrete emotions. The correlations ranged from r = −.74 (disgust) to r = −.80 (sadness), p < .001. Also, arousal showed lower but still significant correlations with discrete emotions, and a pattern opposite to valence was observed. To be more precise, arousal correlated negatively with happiness (r = −.33, p < .001), and positively with negative discrete emotions: fear (r = .66), anger (r = .65), sadness (r = .63), and disgust (r = .57), all p < .001 (Table 3). The highest positive correlation was found between arousal and fear, and the lowest between arousal and disgust.

Table 3 Pearson correlations between discrete emotions and other psycholinguistic variables

Correlations between discrete emotions and several psycholinguistic variables were also calculated (see Table 3). Correlations between discrete emotions, word length, and concreteness were calculated for all 3022 words, as they overlapped with words from the study by Ćoso et al. (2019). The results showed a significant correlation between word length and discrete emotion ratings: there was a small negative correlation with happiness (r = −.04, p < .05), and a positive correlation with all negative discrete emotions, p < .001. The concreteness variable correlated negatively with anger (r = −.14, p < .001), sadness (r = −.11, p < .001), and fear (r = −.06, p < .01). Although significant, these correlations were relatively low. In contrast, happiness and disgust did not correlate with concreteness.

Correlations between discrete emotion ratings and word frequency were also calculated. Word frequencies were derived from the Croatian Web Corpus, hrWaC (Ljubešić & Klubička, 2016), collected from the .hr top-level domain. The exact form of word frequency per million was extracted and further converted to Zipf (van Heuven et al., 2014). A significant positive correlation between Zipf and happiness (r = .28, p < .001) was found, as well as a significant negative correlation with all negative discrete emotions, ranging from r = −13 (sadness) to r = −.23 (disgust), all p < .001.

Correlations between discrete emotion ratings and other subjective variables were also examined. These subjective variables, obtained from a Croatian database by Peti-Stantić et al. (2021), included imageability, subjective frequency, and age of acquisition (AoA). Imageability correlated only with anger (r = −.10, p < .001). AoA showed a negative correlation with happiness (r = −.24, p < .001), whereas positive correlations between AoA and negative discrete emotions were observed. The lowest correlation was found for sadness (r = .13), followed by fear (r = .16), disgust (r = .20), and anger (r = .21), all p < .001. In other words, it seems that words related to positive discrete emotions are learned earlier in life than words related to negative emotions. Finally, subjective frequency positively correlated with happiness, r = 32, p < .001. On the other hand, there was a negative correlation between subjective frequency and negative discrete emotions, p < .001 (Table 3). This pattern is similar to the pattern found for corpus-based frequencies (Zipf), indicating that both subjective and objective frequencies tend to go in the same direction.

Sex differences

The correlations between male and female ratings were significant and strong for all discrete emotions. The lowest correlation was found for disgust, followed by happiness and fear. In contrast, the highest correlation was observed for anger and sadness. To further investigate potential differences between female and male mean ratings for each discrete emotion, paired t-tests were calculated. The results revealed that female participants rated words higher than male participants in three discrete emotion categories: anger (e.g., sakaćenje, mutilation), t(3021) = 10.46, p < .001; fear (e.g., embolija, embolism), t(3021) = 19.10, p < .001; and sadness (e.g., rat, war), t(3021) = 2.13, p = .034. On the other hand, male participants’ ratings were higher in the happiness category (e.g., cigla, brick), t(3021) = 19.23, p < .001. There was no significant difference for disgust, t(3021) = 1.03, p = .303. Complete descriptive statistics with correlations can be found in Table 4. The words with the highest differences between the two sexes for each discrete emotion are shown in Table 5, and the distribution of differences between male and female ratings in Fig. 2.

Table 4 Descriptive statistics and Pearson correlations by sex for each discrete emotion
Table 5 Words with the largest differences in ratings depending on the participants’ sex (when evaluated by male and female participants)
Fig. 2
figure 2

Distribution of differences between male and female ratings for each discrete emotion (negative values: females’ ratings higher than males’; positive values: males’ ratings higher than females’)

Distribution of words across the five discrete emotions

To explore the distribution of words across the five discrete emotions, word ratings for each discrete emotion were first classified into two groups based on the criterion used in previous studies (Stadthagen-González et al., 2018; Syssau et al., 2021). The two groups were (1) high-level ratings, with an average rating of 3 or above; and (2) low-level ratings, with an average rating below 3. If a word’s ratings were below 3 in all discrete emotion categories, it was considered neutral. If a word’s ratings were above 3 on more than one emotion, it was categorised as a discrete emotion with the highest rating. Three words (bombardirati, to bomb; svađa, quarrel; and sebičan, selfish) had the same ratings in two discrete emotion categories, so they were related to both emotions. A total of 2121 words (70.19%) were classified as neutral, and 631 words (20.88%) were categorised as happiness-related. A lower percentage of words were related to negative emotions: 131 words (4.33%) to sadness, 69 words (2.28%) to anger, 38 words (1.26%) to fear, and only 35 words (1.16%) were related to disgust.

To fully understand the distribution of words across discrete emotions, it is important to take into account the so-called purity of words per category (Table 6). Words were considered to be “pure” if their ratings were greater than or equal to 3 in only one discrete emotion. All words from the happiness category (100%) were categorised as “pure”. In contrast, negative emotion categories had a lower percentage of “pure” words. The highest percentage of “pure” words was found in the sadness category (67.94% of all sadness-related words). Anger and fear had a similar number of “pure” words: 41 in the anger category (59.42% of all anger-related words) and 23 in the fear category (60.53% of all fear-related words). Not only did disgust have the lowest number of words in the category, but it also had the lowest number of “pure” words (N = 17, 48.57% of all disgust-related words).

Table 6 Distribution of “pure” words across discrete emotions and neutral words

Words that are not “pure” are often classified as “mixed” words (Ferré et al., 2017; Stadthagen-González et al., 2018). As described above, all happiness-related words were classified as “pure”, meaning that “mixed” words were found only in negative emotion categories. In total, 100 words were classified as “mixed”, with ratings equal to or above 3 in more than one negative emotion category. Among these, ratings for 49 words were high in two categories, 33 in three categories, and 18 in all four negative emotion categories. Words that scored above 3 in all negative emotion categories can be considered the most negative words in the dataset.

Ratings for 39 sadness-related words were above 3 in other negative emotion categories. Similarly, ratings for 26 anger-related words, 18 disgust-related words, and 14 fear-related words were also high on another negative emotion category. In addition, three words had the same ratings for two negative emotions: two words for anger and sadness, and one word for sadness and fear. Anger and sadness had the highest number of overlapping words, 21 to be precise. The number of overlapping words between different categories is shown in Table 7. As mentioned, there were 18 words with high ratings on all four negative emotions. A total of 33 words had high ratings in three different categories: 11 words for anger, sadness, and fear; 14 for anger, sadness, and disgust; four for anger, fear, and disgust; and four for sadness, fear, and disgust. Overall, it seems that “mixed” words are mostly present in a combination of anger and sadness, while disgust has the lowest percentage of “pure” words among all discrete emotions.

Table 7 Number of words with high ratings in two discrete emotions

Finally, a multidimensional scaling (MDS) procedure was used to obtain a graphical presentation of the data. In general, MDS allows visualisation of the relations among data in complex datasets: each word is represented as a single dot while different dimensions of the data are taken into consideration. The MDS procedure took into account the five discrete emotion categories and all words, except the three words with the same ratings in two categories. The procedure used the SMACOF algorithm (De Leeuw, 1977; De Leeuw & Heiser, 1977), implemented in the SMACOF (De Leeuw & Mair, 2009; Mair et al., 2022) package of R (R Core Team, 2021). The two-dimensional solution obtained a stress-1 value of .062, indicating adequate goodness of fit. The MDS results are shown in Fig. 3.

Fig. 3
figure 3

Scatterplot depicting the results of the MDS procedure: (a) words separated according to their affective valence; (b) words separated as "pure" and "mixed"; (c) negative "pure" words separated according to the highest-loaded emotion; (d) "mixed" negative words separated according to the highest-loaded emotion

Panel (a), derived from the MDS procedure, reveals that the horizontal dimension (dimension 1) of the MDS mainly reflects valence: positive words are grouped on the right side, neutral in the middle, and negative words on the left side of the graph. Significant Pearson’s correlation (r = .91, p < .001) between the coordinates in dimension 1 and the words’ valence confirmed this assumption. In panel (b), all positive words are “pure” words. In addition, negative “pure” words are grouped on the far right, neutral in the middle, while negative “mixed” words are grouped on the far-left end. The data in panel (a) and panel (b) indicate that “mixed” words, with high ratings on more than one negative emotion, were rated as more negative than “pure” words. Finally, in panel (c) and panel (d), only negative words were examined (coloured): panel (c) shows “pure” words, and panel (d) “mixed” words coloured in the highest-loaded emotion. Regarding negative “pure” words, panel (c) shows that words from the fear category are located mainly in the upper part of the graph, while disgust-related words are in the lower part of dimension 2. Anger and sadness occupy the central part of dimension 2, with a slight mixture between these two emotions. Also, a small mixture of sadness and fear can be seen in panel (c). In contrast, panel (d) does not show a clear pattern due to a large mixture of emotions.

Discussion

The CROWD-5e database was created to allow research on the interplay between language and discrete emotions in the Croatian language. To achieve this, participants’ subjective ratings were collected for a large set of 3022 Croatian words, divided into five discrete emotion categories: happiness, fear, anger, sadness, and disgust. The strong positive correlations between participant’s ratings in the five discrete emotions indicated that interrater reliability was high, which is in line with previous research (Ferré et al., 2017; Hinojosa et al., 2016; Kapucu et al., 2021; Stadthagen-González et al., 2017). These findings suggest that the participants understood the instructions, even though no examples were given, and that the conceptual representation of each discrete emotion was similar in individuals who participated in the study. We also observed high correlations between CROWD-5e and previous databases for other languages such as English (Stevenson et al., 2007), Spanish (Ferré et al., 2017; Hinojosa et al., 2016), and French (Syssau et al., 2021). These results suggest that ratings for discrete emotions are generally stable across languages and cultures. In this vein, recent evidence points to the existence of universal structures in the representation of emotional concepts in language (Jackson et al., 2019).

Words in the CROWD-5e database were unequally distributed across discrete emotions. The majority of words were rated as neutral, followed by words that were rated high on happiness. When it comes to negative discrete emotions, the highest number of words fell into the sadness and anger categories. Of note, 100 words did not fall into a single negative emotion category, but were rather rated on two discrete emotions. The database did not show a clear pattern for “mixed” words, as there was an overlap between all negative emotion categories. Still, categorising words by discrete emotions might be very important for research designs that rely on linguistic items. The existence of “mixed” words shows that a word can be related to more than one emotion, which could depend on the context in which it is used (Hoemann et al., 2017). The data on whether a word is “mixed” or “pure” with respect to a specific discrete emotion is relevant in studies examining the role of discrete emotions in word processing. For example, “mixed” words might be processed differently from “pure” words (Briesemeister et al., 2012).

In line with previous normative studies (Ferré et al., 2017; Hinojosa et al., 2016; Stadthagen-González et al., 2018; Syssau et al., 2021), the results of this study showed that words that were rated high on negative emotions were also rated low on happiness. In contrast, ratings for all negative discrete emotions correlated highly with each other. Of note, a consistent finding from the current and other normative studies (e.g., Stadthagen-González et al., 2018; Syssau et al., 2021) is that ratings on sadness are strongly related to ratings on both fear and anger. These results suggest there might be a link between the conceptual representations of sadness and anger, which possibly arises from the fact that both emotions are elicited by perceived goal loss (Lench et al., 2016). With regard to the relationship between fear and sadness, there is evidence indicating that they share some neural mechanisms (e.g., activation in medial orbitofrontal cortex and left amygdala; Wilson-Mendenhall et al., 2013).

Previous normative studies (Ferré et al., 2012) observed differences in emotional dimension ratings for different semantic categories. Thus, the possibility that similar differences could occur in the case of discrete emotions was also examined. To achieve this, nouns from the dataset were categorised into three categories—Beings, Things and Occurrences—following the Croatian noun categorisation system (Barić et al., 2005; Hudeček & Mihaljević, 2019). No significant differences were found between the noun categories in the happiness ratings. In the case of negative discrete emotions, Occurrences were rated higher than Beings, and Beings were rated higher than Things. The only exception to this pattern was that nouns from the Occurrences and Beings categories had similar scores in disgust, and both groups were rated higher than nouns from the Things category. This observation supports previous results from cluster analyses indicating that animal- and event-related stimuli, judged as morally or socially unacceptable, are the main elicitors of this emotion (Marzillier & Davey, 2004; Rozin et al., 2008). To the best of our knowledge, there is no previous research on the comparison of the emotional features of words from different categories. Using pictures, Cao et al. (2014) reported that negative images of animals relative to pictures of objects elicited increased amygdala activation. Thus, the higher biological significance of emotional living entities than that of emotional inanimate objects, which possibly has an adaptive phylogenetic origin, might explain the results. Furthermore, current data indicate that nouns that denote occurrences, including actions and events, were those that scored highest on fear, anger and sadness. This might be tentatively related to the variety of affective referents of these nouns, compared with the narrow semantic referents of words belonging to the categories of beings and things. In this sense, nouns denoting occurrences include those related to emotional actions (e.g., bombardiranje, bombardment; terorizam, terrorism), emotional events (e.g., smrt, death; kriza, crisis), social emotions (sram, shame; krivnja, guilt), or words with high emotional prototypicality scores (Pérez-Sánchez et al., 2021; e.g., srdžba, fury; strah, fear). Nonetheless, these findings open up new avenues for examining the causes of differences in emotionality between noun categories.

Previous studies have reported subtle differences between female and male ratings in some discrete emotion categories. However, female participants have often outnumbered male participants (e.g., Stevenson et al., 2007, 58.20%; Briesemeister et al., 2011b, 67.09%; Hinojosa et al., 2016, 76.82%; Ferré et al., 2017, 77.98%; Stadthagen-González et al., 2018, 78.86%; Kapucu et al., 2021, 62.34%; Syssau et al., 2021, 80.03%). In the current study, having a fairly balanced sample of male (51.82%) and female (48.18%) participants allowed for the examination of potential sex differences in discrete emotion ratings. In agreement with previous observations, the results showed that correlations between female and male ratings were high for all discrete emotions (Stadthagen-González et al., 2018; Stevenson et al., 2007). Nonetheless, additional analyses showed that female participants rated words from the anger, fear, and sadness categories higher than male participants, while male ratings on happiness were higher than female ratings. However, the magnitude of these differences is quite low (between 0.02 and 0.21 points). Previous observations indicate that sex differences follow a different pattern across studies for some discrete emotions. In this sense, Stevenson et al. (2007) found that female ratings on all negative emotion categories were higher than male ratings. Moreover, Ferré et al. (2017) reported higher male ratings for happiness and disgust, and higher female ratings for fear. Hinojosa, et al. (2016) observed that male ratings were higher for anger and fear than female ratings. Finally, Syssau et al. (2021) found higher scores for words related to anger, sadness and disgust in male participants, while ratings for fear were higher in female participants. This contradictory pattern of results suggests that sex differences in the assessment of discrete emotions cannot lead to strong conclusions. Nonetheless, reporting these differences is important for researchers who need to select the words for their experiments (Syssau et al., 2021). This might be useful to overcome some limitations of previous studies investigating sex differences in the processing of emotional words, which did not differentiate between male and female emotional ratings when selecting the stimuli (e.g., Hofer et al., 2007; Shirao et al., 2005). To sum up, our findings show that although the ratings of female and male participants are highly consistent, some sex differences in the assessment of emotional words do exist, which is in line with previous studies reporting differences in the processing of emotional language between men and women (e.g., Hamann & Canli, 2004; Smith & Waterman, 2005).

Additionally, we examined the existence of associations between scores in discrete emotions and affective dimensions. Not surprisingly, happiness correlated positively with valence, whereas all four negative discrete emotions were rated low on valence, as in previous normative studies (e.g., Briesemeister et al., 2011b; Ferré et al., 2017; Syssau et al., 2021). Similar to earlier findings (e.g., Stadthagen-Gonzalez et al., 2018), we found that words that were rated high on happiness were also rated low on arousal, whereas words that were rated higher on all negative emotions were rated higher on arousal. In particular, words associated with fear and anger were the most arousing, while those related to disgust were the least arousing. This pattern could be attributed to the frequent use of fear-related (e.g., silovanje, rape) and anger-related (e.g., terorizam, terrorism) words to denote highly arousing events, whereas words related to disgust (e.g., smrad, stink), sadness (e.g., depresivan, depressed), and happiness (e.g., obitelj, family) may have non-arousing referents. We further examined the relationship between discrete emotions and affective dimensions through MDS. The MDS graphical presentation showed that ratings on discrete emotions looked similar to the visual representation of ratings on the valence and arousal dimensions. Of note, while MDS dimension 1 showed a strong and positive correlation with valence, dimension 2 did not show such strong correlation with arousal. This suggests that dimension 2 is related not only to arousal, but also to discrete emotions and their relationship to each other. This is particularly evident in the case of “mixed” words, which scored high on several negative discrete emotions and spread along dimension 2. The high correlation between negative emotions and the strong correlation of MDS dimension 1 with valence suggests that a dimensional approach to emotions, in which the affective space is represented by valence and arousal, seems more likely. However, arousal differences between words associated with different negative emotions (e.g., fear vs. disgust) in dimension 2 indicate that a discrete emotion approach is also conceivable. Of note, the results of several studies that controlled for valence and arousal point to the existence of discrete emotion effects in the processing of negative words. These studies reported differences in the processing of fear-related and anger-related words using approaching-distancing tasks (Huete-Pérez et al., 2019; Santaniello et al., 2022). Such differences were also observed in the recall accuracy of fear-related and disgust-related words (Ferré et al., 2018). Moreover, there seems to be evidence suggesting that discrete emotions may serve as a basis for subsequent dimensional appraisal processes (Briesemeister et al., 2014). This finding cannot be interpreted only in terms of traditional affective-dimension or discrete-emotion accounts. Thus, it seems that the combination of different theoretical approaches is still needed in order to unravel the contribution of emotion to language processing, which is in line with the views that aim to integrate discrete and dimensional conceptions into a unified theoretical approach (Panksepp & Watt, 2011; Russell, 2003).

Previous research has reported a relationship between emotional features and several psycholinguistic variables in both normative and language-processing studies (see Citron, 2012 and Hinojosa et al., 2020, for reviews). We found that positive words tend to be shorter (see Hinojosa et al. 2016 and Syssau et al., 2021 for similar results), which might reflect increased efficiency in communicating more frequent positive events (Rozin et al., 2010). The results also showed that words conveying happiness had higher objective and subjective word frequency values. Apart from their consistency with results from other normative studies (e.g., Stadthagen-González et al., 2018), these findings are also in line with the so-called positivity bias in language use, that is, the tendency to use positive words more often than negative words in both written and spoken speech (Augustine et al., 2011). This effect has been attributed to the positive implications of most events that we experience in our daily life (Rozin et al., 2010). In contrast, words denoting negative discrete emotions were less objectively and subjectively frequent, an effect that was particularly evident in disgust-related words (as in Ferré et al., 2017; Hinojosa et al., 2016). Of note, word frequency plays a relevant role in the processing of negative words. In particular, a processing advantage for low frequency negative words has been reported at both lexical access (Scott et al., 2009) and post-lexical processing stages (Méndez-Bértolo et al., 2011). Current norms provide a useful tool which can be used to investigate whether this processing benefit extends to low frequency words denoting different negative discrete emotions or it is just limited to some of them.

The results also showed that words denoting happiness are acquired earlier than those conveying negative discrete emotions. A similar relationship between AoA and positive emotions has been found in previous normative studies (e.g., Ponari et al., 2018). Also, developmental studies that assessed children’s knowledge of emotional vocabularies reported that words expressing positive affective states are learnt earlier in life than those related to negative feelings (Bahn et al., 2017; Baron-Cohen et al., 2010; Li & Yu, 2015). All these findings suggest the existence of a positivity bias in word acquisition, which possibly arises from the use of motherese communicative styles and positive vocalisations by parents and caregivers when addressing infants (Dave et al., 2018). Additionally, although a negative correlation was observed between AoA and all negative discrete emotions (see also Stadthagen-González et al., 2018; Syssau et al., 2021), this correlation was lower for the sadness category. This finding is in line with studies showing that 2-year-old children are particularly confident when using the word “sadness” to label faces expressing this emotion, while correct matches between faces evoking other negative emotions such as fear or disgust, and their corresponding labels are only evident in 5-year-old children (Widen & Russell, 2008).

Finally, we observed a relationship between some negative discrete emotions and two sensory-related psycholinguistic variables, concreteness (i.e., the extent to which a word’s referent can be experienced with the senses), and imageability (i.e., the extent to which a word elicits mental images related to different sensory modalities). In particular, more abstract words showed higher scores in sadness, fear, and anger, the latter being also associated with lower imageability ratings. Despite the fact that similar correlations were found in other normative studies (concreteness: Syssau et al., 2021; Stadthagen-González et al., 2018; imageability: Syssau et al., 2021), previous evidence indicates that affective dimensions make a key contribution to the processing of abstract words (e.g., Hinojosa et al., 2014; Ponari et al., 2018). From an embodied language perspective, these findings have led to the suggestion that emotional features are particularly relevant for the conceptual representation of abstract words, which lack sensory-related properties (Vigliocco et al., 2014). While the impact of discrete emotions on the processing of concrete and abstract words remains unexplored, the high negative correlation between anger and both concreteness and imageability suggests that this emotion might be relevant for the conceptual representation of words that elicit few mental images and have less sensory-related referents. Nonetheless, this possibility should be further tested in word processing studies.

In conclusion, the present study provides norms for 3022 Croatian words for the five discrete emotions: happiness, anger, fear, sadness, and disgust. This database expands the existing Croatian norms for valence, arousal, and concreteness for the same set of words (Ćoso et al., 2019). The results from our analyses highlight the need to control psycholinguistic variables and to consider both the dimensional and the discrete emotion approach when conducting research on the interplay between language and emotion (Hinojosa et al., 2020; Kousta et al., 2009). By enabling greater experimental control, the CROWD-5e database represents a valuable resource for future psycholinguistic research in Croatian.