In 1999, Margaret Bradley and Peter Lang provided the scientific community with The Affective Norms for English Words, popularly known as ANEW (Bradley & Lang, 1999). This instrument comprised a set of normative evaluative ratings for 1,034 words in the English language and marked a milestone in the study of emotion from the dimensional perspective (e.g., Bradley & Lang, 2000; Citron, Weekes, & Ferstl, 2014a; Kensinger, 2004; Kousta, Vinson, & Vigliocco, 2009). ANEW has been used in innumerable studies with diverse goals, and it has accrued almost 2,000 citations on Google Scholar.

The dimensional perspective conceptualizes emotion as having three basic underlying dimensions, valence, arousal, and dominance. Valence refers to the hedonic value of a specific emotion, ranging from unpleasant to pleasant, whereas arousal, ranging from calming to exciting, indicates the degree of activation experienced. Dominance, by turn, indicates the control experienced in relation to that situation, and ranges from a submissive feeling to one of complete control (Bradley & Lang, 2000). Of the three dimensions, valence and arousal have been revealed as the most relevant. Over the last twenty-five years, research into emotion and language made great use of the fact that words can be characterized according to these dimensions (Ferré, Guasch, Martínez-García, Fraga, & Hinojosa, 2017). Thus, pleasant, unpleasant, and neutral words tend to be distributed in the affective space defined by valence and arousal in such a way that they adopt a boomerang shape (e.g., Guasch, Ferré, & Fraga, 2016; Montefinese, Ambrosini, Fairfield, & Mammarella, 2014; Moors et al., 2013; Redondo, Fraga, Padrón, & Comesaña, 2007; Soares, Comesaña, Pinheiro, Simões, & Frade, 2012; Stadthagen-Gonzalez, Imbault, Pérez Sánchez, & Brysbaert, 2017; Warriner, Kuperman, & Brysbaert, 2013). This shape reflects the fact that pleasant and unpleasant words (and other stimuli) tend to have higher arousal levels than do neutral words, evidencing a quadratic relationship between the two dimensions. The usefulness of this theoretical approach and the instruments associated with it led to the development of new databases in languages other than English. The first adaptation of the ANEW was developed in Spanish by Redondo et al. in 2007. Its use spread quickly, which has been of great advantage in studies looking at the effects of the affective content of words in a language spoken by millions of people around the world. Indeed, the impact of the Spanish version of the database can be seen in the almost 200 citations on Google Scholar. More recently, ANEW has been adapted to Brazilian and European Portuguese (Kristensen, Gomes, Justo, & Vieira, 2011, and Soares et al., 2012, respectively), to Italian (Montefinese et al., 2014), German (Schmidtke, Schröder, Jacobs, & Conrad, 2014), and Polish (Imbir, 2015).

Interestingly, it was also in 2007 that Stevenson, Mikels, and James published “Characterization of the Affective Norms for English Words by Discrete Emotional Categories.” This study was inspired by the other major theoretical approach to the description of emotions: the discrete emotion theory, according to which the number of discrete emotions is limited (i.e., happiness, anger, sadness, fear, and disgust). These emotions have specific characteristics, physiological correlates, and behavioral action tendencies, and are associated with distinct emotional experiences (Ekman, 1992). Through the work of Stevenson et al., researchers were provided with an instrument allowing for the study of affect in a comprehensive way, by taking both theoretical approaches into account. In Stevenson et al.’s own words, “whereas a dimensional approach can describe a number of broad features of emotion, and the categorical approach can capture more discrete emotional responses, the two can also be used in combination to supply experimenters with a more complete view of affect” (p. 1021). The utility of this joint approach to the study of emotion has been evidenced in recent research (e.g., Briesemeister, Kuchinke, & Jacobs, 2011a, b, 2014; Briesemeister, Kuchinke, Jacobs, & Braun, 2015; Silva, Montant, Ponz, & Ziegler, 2012).

With regard to the Spanish language, the publication of a previous database that comprised affective ratings for 478 words (Redondo, Fraga, Comesaña, & Perea, 2005) and the Spanish adaptation of the ANEW (Redondo et al., 2007) opened the doors to the publication of successive new bases of words, with seven new bases of emotional words having been published over the last decade. These bases improved on previous works in different ways, by considerably increasing the number and type (i.e., nouns, verbs, adjectives) of words (Stadhagen-González et al., 2017), by looking into the relation between affective and semantic variables (Ferré, Guasch, Moldovan, & Sánchez-Casas, 2012; Guasch et al., 2016), by providing discrete emotion ratings (Ferré et al., 2017; Hinojosa, Martínez-García, et al., 2016), and by establishing the relevance of affective words for emotional states like anxiety, depression, and anger (Pérez-Dueñas, Acosta, Megías, & Lupiáñez, 2010). Indeed, the availability of these databases made possible much recent research in this domain in Spanish (e.g., Altarriba & Basnight-Brown, 2011; Comesaña et al., 2013; Conrad, Recio, & Jacobs, 2011; Díaz-Lago, Fraga, & Acuña-Fariña, 2015; Ferré & Sánchez-Casas, 2014; Ferré, Sánchez-Casas, & Fraga, 2013; Ferré, Ventura, Comesaña, & Fraga, 2015; Gantiva, Delgado, & Romo-González, 2015; González-Villar, Triñanes, Zurrón, & Carrillo-de-la-Peña, 2014; Hinojosa, Méndez-Bértolo, & Pozo, 2010). The aim of the present article is to further enable the work of researchers interested in language and emotion by introducing emoFinder, a digital tool in which almost all these word bases are gathered in a single instrument. Hence, it provides dimensional ratings (i.e., valence, arousal, and dominance) as well as discrete emotion ratings (i.e., happiness, disgust, anger, fear, and sadness) for a large set of words. Specifically, emoFinder includes subjective ratings for 16,375 different words in Spanish, with ratings in the main emotional variables, valence and arousal, for 14,414 words. EmoFinder has two main functionalities: It allows values to be found for a given set of words, and permits searches for words that fit particular criteria with respect to specific variables.

Importantly, more and more variables other than affective content are known to affect language processing. Some of these are lexical or sublexical, such as word length (Acha & Perea, 2008), frequency, neighborhood density, or age of acquisition (González-Nosti, Barbón, Rodríguez-Ferreiro, & Cuetos, 2014). Others are semantic in nature, such as concreteness, imageability, semantic ambiguity or number of associates (see Pexman, 2012, for an overview). It is worth mentioning here that some of these variables do not only affect language processing in general, but also affective language processing in particular. For instance, lexical frequency has been reported to modulate the processing of emotional words, the effects of the emotional content being observed only in low frequency words, and not in high frequency words (Méndez-Bértolo, Pozo, & Hinojosa, 2011). Another variable that has been shown to interact with emotionality is concreteness. Indeed, some studies that have investigated such interaction reveal that emotional content affects the processing of abstract words more than that of concrete words (Ferré et al., 2015; Newcombe, Campbell, Siakaluk, & Pexman, 2012; Yao & Wang, 2014), in line with some proposals stating that emotional knowledge underlies meanings for abstract concepts more than for concrete concepts (Vigliocco, Meteyard, Andrews, & Kousta, 2009). Taking the above into consideration, the use of large corpora is very important here. Only such an approach can assure the selection of a sufficient number of experimental stimuli, as well as both an adequate control of lexico-semantic factors and their manipulation to study their interaction with emotional content in processing. Datasets of this kind are available in Spanish for objective measures, these obtained by applying mechanical procedures to large corpora of words (e.g., EsPal; Duchon, Perea, Sebastián-Gallés, Martí, & Carreiras, 2013). However, obtaining values for most semantic variables is a laborious and time-consuming procedure, involving large samples of participants who are asked to make subjective ratings. For this reason, we have also included in emoFinder the available data in Spanish for some of these subjective variables, in particular concreteness, imageability, familiarity, context availability, age of acquisition, and sensory experience, gathered from different normative studies. Sensory experience is possibly a less well-known variable that refers to “the degree to which words evoke a sensory or perceptual experience when read silently” (Juhasz & Yap, 2013, p. 160). Thus, it may reflect links between meaning and all sensory/perceptual modalities, going beyond the proper modality of reading, the visual one. Finally, we should note that the contribution of some of the (mainly objective) variables above to language processing in general, and to word recognition in particular, has been attested in recently published mega-studies in which participants are asked to make lexical decisions (i.e., lexical decision tasks, LDTs) with large sets of words (e.g., Balota et al., 2007; Keuleers, Lacey, Rastle, & Brysbaert, 2012). To allow researchers to study the contribution to word processing of not only objective variables but also affective and semantic variables, we have included in emoFinder the database of González-Nosti et al., which includes lexical decision times for a large set of Spanish words. Therefore, emoFinder, containing 16,375 Spanish words rated across the different databases therein, is potentially a very useful tool for researchers interested in the role of affective variables, and their interaction with semantic variables, in language processing.

Databases included in emoFinder

Currently, emoFinder incorporates ten databases that include emotional and/or lexical–semantic variables in Spanish (see Table 1 for the list of databases included, and Table 2 for descriptions of the variables).

Table 1 Chronological list of databases included in emoFinder, with their respective list of variables and the number (N) of words rated in each one
Table 2 Variable name, definition, and number (N) of unique words with data for each of the 15 variables available in emoFinder

Concerning the affective databases included in emoFinder, most of these have been compiled from a dimensional perspective, although two provide discrete emotion ratings instead. The databases that include ratings of the relevant variables from a dimensional perspective are those of Redondo et al. (2005; Redondo et al., 2007), Ferré et al. (2012), Guasch et al. (2016), Hinojosa et al. (Hinojosa, Martínez-García, et al., 2016; Hinojosa, Rincón-Pérez, et al., 2016), and Stadthagen-González et al. (2017).

The oldest normative study in Spanish taking the dimensional approach is the database of Redondo et al. (2005). Those authors collected ratings on valence and arousal for a set of 478 words by using the Self-Assessment Manikin (SAM; Bradley & Lang, 1994). The SAM is a graphical scale of 9 points depicting a human figure representing these two emotional dimensions. Concerning valence, a sad figure represents the negative anchor, whereas a smiling figure represents the positive one. Regarding arousal, the figure is relaxed in the lower anchor, but very excited in the opposite side. The SAM has also been used in the assessment of dominance, a dimension not included in the database of Redondo et al. (2005). In this case, the figure is very small when representing submission and very big (standing out of the frame) at the dominant endpoint. Today the SAM is the most widely used method for assessing the affective properties of words and has been used in normative studies in other languages (Ho et al., 2015; Soares et al., 2012; Warriner et al., 2013).

After Redondo et al. (2005), these authors extended their previous work by adapting ANEW (Bradley & Lang, 1999) to Spanish (Redondo et al., 2007), thus providing normative data on valence, arousal, and dominance for the 1,034 words included in the original dataset. Similarly to Redondo et al. (2007), the other databases included in emoFinder providing dimensional ratings also used SAM to obtain ratings of valence (Ferré et al., 2012; Guasch et al., 2016; Hinojosa, Martínez-García, et al., 2016), arousal (Ferré et al., 2012, Guasch et al., 2016; Hinojosa, Martínez-García, et al., 2016), and dominance (Hinojosa, Rincón-Pérez, et al., 2016). The only exception is the database of Stadthagen-González et al. (2017). This is currently the largest Spanish database for valence and arousal, including ratings for 14,031 words. Although not using SAM, the authors provided participants with a 1–9 scale for ratings, as in previous studies. Of note, Stadthagen-González et al. included in their dataset the words from the Spanish adaptation of ANEW (Redondo et al., 2007) as control words, obtaining correlations of r = .98 for valence and r = .75 for arousal across datasets. Thus, the ratings of Stadthagen-González et al. can be reliably combined with those of the previous databases.

Apart from emotional dimensions, some of the above databases also provide ratings for other relevant variables. For instance, the database by Ferré et al. (2012), despite its reduced size (380 words), includes words belonging to three semantic categories. Furthermore, it provides ratings of concreteness and subjective familiarity.Footnote 1 Similarly, in their normative study involving a larger number of words (1,400), Guasch et al. (2016) included ratings for the variables concreteness, imageability and context availability. Along similar lines, Hinojosa et al. (Hinojosa, Martínez-García, et al., 2016; Hinojosa, Rincón-Pérez, et al., 2016) provided ratings of lexical and semantic variables. Thus, the 875 words included in their databases were also rated for concreteness (Hinojosa, Martínez-García, et al., 2016), familiarity, age of acquisition and sensory experience (Hinojosa, Rincón-Pérez, et al., 2016). Since all these words were also rated for valence and arousal, the datasets here provide valuable data for the study of the relationship between affective and semantic properties.

As we noted in the introduction, the other major theoretical approach in the study of emotions is the discrete emotions perspective. EmoFinder includes two databases developed from this approach (Ferré et al., 2017; Hinojosa, Martínez-García, et al., 2016). The first of these is the dataset of Hinojosa, Martínez-García, et al., previously mentioned. These authors, apart from collecting data on valence and arousal, asked participants to rate 875 words in the five discrete emotions included in the pioneering study of Stevenson et al. (2007): happiness, anger, sadness, fear, and disgust. Similarly, the database by Ferré et al. (2017), developed following the procedure of Hinojosa, Martínez-García, et al., provides discrete emotion ratings for a set of 2,266 words. Of note is that this dataset includes the words of the Spanish adaptation of ANEW (Redondo et al., 2007), as well as those rated by Ferré et al. (2012) and Guasch et al. (2016). Therefore, both dimensional ratings and discrete emotion ratings are available for the 3,141 words included in the datasets of Hinojosa, Martínez-García, et al. (2016), and Ferré et al. (2017). This will allow researchers to undertake comparative studies of the contribution of both perspectives to language processing.

Finally, emoFinder also includes two databases with non-emotional variables (Alonso, Fernández, & Díez, 2015; González-Nosti et al., 2014). Concerning the former, Alonso et al. (2015) provided subjective ratings of age of acquisition (AoA) for 7,039 Spanish words, a variable that has been related to word processing (see Juhasz, 2005, and Johnston & Barry, 2006, for reviews). In Spanish, there have been several normative studies on subjective AoA (Alonso, Díez, & Fernández, 2016; Alonso et al., 2015; González-Nosti, 2014; Hinojosa, Rincón-Pérez, et al., 2016; Moreno-Martínez, Montoro, & Rodríguez-Rojo, 2014). However, we have only included in emoFinder the datasets of Alonso et al. (2015) and Hinojosa, Rincón-Pérez, et al. (2016), because these two studies used a comparable 11-point rating scale, whereas the other studies used either a 7-point rating scale (González-Nosti et al., 2014; Moreno-Martínez et al., 2014) or no scale at all (Alonso et al., 2016).Footnote 2

The final dataset included in emoFinder is that of González-Nosti et al. (2014). It provides normative data on reaction times for a set of 2,765 words in an LDT. As we said in the introduction, the information on performance with large sets of words in a word recognition task (i.e., the LDT) can be of great value for our understanding of the most relevant variables in word processing. Among the 2,765 words here, emoFinder includes affective ratings for a large set of those words. Thus, there are ratings for valence, arousal, and dominance for 2,625, 2,625, and 647 words, respectively, as well as discrete emotion ratings for 1,101 words. These data can be very useful for the study of the contributions of affective variables to word recognition.

In sum, emoFinder includes subjective ratings for 16,375 different words in Spanish, taking the values that correspond to 14 potential relevant variables, plus lexical decision times from ten different studies. It should be noted that there is a high and positive correlation between the ratings of the overlapped words across the databases included in emoFinder when we consider valence (ranging from .90 to .99), arousal (ranging from .68 to .92), concreteness (ranging from .59 to .82), familiarity (ranging from .78 to .85), and AoA ratings (.95). Importantly, after reviewing the most representative emotional databases in English, and those in other languages for which English translations are provided, we found a significant quantity of word matches with emoFinder (see Table 3). This will allow researchers to conduct cross-language studies.

Table 3 International affective databases and their matches (in number and percentage of words) with emoFinder

Description of the emoFinder website

EmoFinder is a free tool that can be accessed at http://usc.es/pcc/emofinder. As we stated above, it currently includes ten different databases, but its architecture is highly scalable and it can easily be expanded to include additional databases in the future. EmoFinder offers two main functionalities in order to support two common needs of researchers: to find stimuli that match a group of characteristics, and to find the values of a given list of words within a set of variables. Both functionalities are integrated into the main screen (see Figs. 1 and 4).

Fig. 1
figure 1

Screenshot of the main screen of emoFinder, with some variables and criteria selected

Regarding the first functionality, if the user needs to find a pool of words that match specific criteria (e.g., to look for words having a valence score within a particular range), the first step is to select the databases where emoFinder will search for the words. By clicking on each database name, emoFinder returns the reference of the source database. The user does not need to know in advance which variables are available for each database, because the variable list is updated dynamically according to the available variables in the selected databases (see Fig. 1).

Because all the variables are numerical, distinct search criteria are available for all of them. One possibility is to set the exact desired value (i.e., valence = 5). Additionally, emoFinder allows researchers to look for words that have a minimum value for a particular variable (e.g., ≥5), a maximum value (e.g., ≤5), or fall within a given range (e.g., 5 to 6.5). Furthermore, the user can also set the desired word length, or even establish a certain word pattern. EmoFinder accepts two different wildcards in the input of the word pattern field. The first is the “_” character, which indicates that this character can be replaced by any other character. The second is the “%” character, which indicates that it can be replaced by any sequence of letters. By combining both it is possible to search for virtually any possible pattern of letter strings. For instance, it is possible to search for words beginning with a given letter (e.g., “a%”), ending with a given syllable (e.g., “%dor”), or even more complex patterns. In this example, the pattern “_are%do” would retrieve words that have just one letter before the sequence “are,” and end with the sequence “do” (the results would be “mareado” [dizzy], “pareado” [couplet], and “parecido” [similar]). Since the same word can have different values for the same variable across databases, the options screen allows the user to choose whether the search conditions must be fulfilled in all databases in which the word appears or in at least one database (see Fig. 2). The former option, clearly, is far more restrictive than the latter. Moreover, setting a high number of search criteria at once can lead to searches with no matching results.

Fig. 2
figure 2

Detail of the Options screen

Once the search has been made, the results automatically appear at the bottom of the screen (see Fig. 3). A word fulfilling the search criteria is displayed in each row, together with all the values in the selected variables. Each column header indicates the variable name and the source database.

Fig. 3
figure 3

Detail of the results obtained with the criteria selected in Fig. 1. (These are English translations of the words on the screen: acariciar = to caress; acogedora (fem.) = cozy; acogedor (masc.) = cozy; adorable = lovely; afortunado = lucky; agradecido = grateful; amabilidad = kindness; amanecer = dawn; amigable = friendly; amistoso = friendly; apasionado = passionate; aplaudir = to applaud; aplausos = applause; arcoíris = rainbow; aventura = adventure.)

The Results section in emoFinder is highly customizable to suit researchers’ needs. The results are initially sorted alphabetically by words, but all the results can be sorted according to each variable by clicking on the column header. Furthermore, columns can be deleted by clicking the “x” symbol, or arranged according to the desired order. To do this, the user has to previously rearrange the variables in the list by dragging the three-bar icon. Additionally, the options menu (see Fig. 2) allows the user to group the results by database rather than by variable. This same screen also makes it possible to change the separating symbol of the decimal (i.e., to set it as a point or as a comma), to adjust to the different regional configurations. The results can be easily copied onto the clipboard or exported to a comma-separated value (CSV) file.

Finally, to guarantee reasonable performance for many concurrent users, all queries are limited to 5,000 results. When the matching words in a search exceed that limit, only a 5,000-word random subset of them are returned, and the user is informed accordingly. Given this randomization, each run of the same query surpassing 5,000 results will provide a different set of words.

The second functionality of emoFinder is to search for the values of a given list of words in the selected databases for particular variables. To perform the query, the user simply has to paste the list of words into the corresponding text box (see Fig. 4), one word per line.

Fig. 4
figure 4

Screenshot of the main screen of emoFinder with a query for the values of some variables for specific words. (These are English translations of the words on the screen: beso = kiss; alegre = happy; amigo = friend; cariñoso = affectionate; feliz = happy; risa = laughter.)

When clicking the query button, the results are automatically retrieved in the same order as in the input list, to allow for easy integration with the experimenter’s spreadsheet (see Fig. 5), but the same sorting and arranging options that are available in the previous functionality are also available here.

Fig. 5
figure 5

Detail of the results obtained with the query made in Fig. 4

Conclusion

The task of selecting stimuli for research in word processing often involves looking for word properties in several distinct databases, to control for all relevant variables. This is a laborious and error-prone task, especially when the number of available databases is large. Keeping this in mind, we developed emoFinder, a search engine that makes the most relevant affective and semantic properties available for 16,375 Spanish words, gathered from ten different normative databases. The tool provides a clean and intuitive interface with several functionalities to assist in this task, and will be particularly useful for researchers interested in the interplay between language and emotion.