The Nencki Affective Picture System (NAPS): Introduction to a novel, standardized, wide-range, high-quality, realistic picture database
Selecting appropriate stimuli to induce emotional states is essential in affective research. Only a few standardized affective stimulus databases have been created for auditory, language, and visual materials. Numerous studies have extensively employed these databases using both behavioral and neuroimaging methods. However, some limitations of the existing databases have recently been reported, including limited numbers of stimuli in specific categories or poor picture quality of the visual stimuli. In the present article, we introduce the Nencki Affective Picture System (NAPS), which consists of 1,356 realistic, high-quality photographs that are divided into five categories (people, faces, animals, objects, and landscapes). Affective ratings were collected from 204 mostly European participants. The pictures were rated according to the valence, arousal, and approach–avoidance dimensions using computerized bipolar semantic slider scales. Normative ratings for the categories are presented for each dimension. Validation of the ratings was obtained by comparing them to ratings generated using the Self-Assessment Manikin and the International Affective Picture System. In addition, the physical properties of the photographs are reported, including luminance, contrast, and entropy. The new database, with accompanying ratings and image parameters, allows researchers to select a variety of visual stimulus materials specific to their experimental questions of interest. The NAPS system is freely accessible to the scientific community for noncommercial use by request at http://naps.nencki.gov.pl.
KeywordsEmotion induction Affective visual stimuli Affective ratings Picture database Physical properties Gender differences International Affective Picture System Nencki Affective Picture System
One of the most important tasks for experimenters when studying the influences of emotions on various cognitive processes (e.g., memory or attention) is the selection of appropriate and controlled stimuli for inducing specific emotional states (Gerrards-Hesse, Spies, & Hesse, 1994; Horvat, Popović, & Ćosić, 2012, 2013). Emotionally charged materials in different modalities (auditory, lexical, and visual) have been widely used in both behavioral and neuroimaging research for both healthy and clinical populations (Grabowska et al., 2011; Marchewka & Nowicka, 2007; Posner et al., 2009; Viinikainen, Kätsyri, & Sams, 2011). Currently, several sets of standardized, emotionally charged stimuli are freely available to researchers worldwide. These sets are standardized on the basis of either dimensional or discrete category theories of emotion (Barrett, 2006; Dalgleish, 2004; Ekman, 1992). Dimensional theories of emotion claim that affective experiences can be characterized by several fundamental dimensions. These dimensions might include valence, arousal (sometimes referred to as activation), dominance, and approach–avoidance (see Mauss & Robinson, 2009, for a review), and each dimension has its own range. Valence ranges from highly positive to highly negative, and arousal from excited/aroused to relaxed/unaroused (Lang, Greenwald, Bradley, & Hamm, 1993; Russell, 1980). Approach–avoidance, also known as “motivation direction,” ranges from tendency to approach to tendency to avoid a stimulus. Finally, dominance represents the degree of perceived control over the affective stimulus, and ranges from feeling in control to feeling out of control. Although the range is defined, there is disagreement as to whether approach and avoidance are more synonymous with positive or negative states (Watson, Wiese, Vaidya, & Tellegen, 1999). Furthermore, it has been suggested that motivational direction and affective valence are independent. This has been demonstrated for the case of anger: In spite of being a negative affective state, anger can nevertheless be associated with approach tendencies (Carver & Harmon-Jones, 2009; Gable & Harmon-Jones, 2010).
As compared to dimensional-category theories of emotion, discrete-category theories assume that the above-mentioned dimensions are too simple to accurately reflect the neural systems underlying emotional responses. Instead, discrete-category theories propose the presence of at least five basic universal emotions (e.g., happiness, anger, fear, disgust, and sadness), as was originally suggested by Darwin (1872).
Typically, to collect normalized ratings for arousal, dominance, or valence, the Self-Assessment Manikin (SAM) scale is employed (Bradley & Lang, 1994). Recent studies have also employed computer-based sliders that are moved along a gradually colored bar in order to collect ratings (Dan-Glauser & Scherer, 2011). For basic emotions, the most common approach is to directly ask participants to name emotions by using predefined labels with an indication of intensity level (Briesemeister, Kuchinke, & Jacobs, 2011; Fujimura, Matsuda, Katahira, Okada, & Okanoya, 2011; Mikels et al., 2005). A number of neuroimaging studies have shown distinct neuronal patterns related to ratings based on both dimensional- and discrete-category theories of emotions (Tettamanti et al., 2012; Viinikainen et al., 2011).
Emotionally charged stimuli and databases
When studying emotions, researchers can choose stimuli from existing standardized databases of auditory, verbal, and visual materials. The International Affective Digitized Sounds (IADS) is one of the most frequently used databases of emotionally charged auditory stimuli, with sounds being characterized according to emotional valence and arousal (Bradley & Lang, 1999), as well as to discrete emotional categories (Stevenson & James, 2008). Other sets of audio stimuli include the Montreal Affective Voices (Belin, Fillion-Bilodeau, & Gosselin, 2008), Portuguese sentences and pseudosentences for research on emotional prosody (Castro & Lima, 2010), vocal emotional stimuli in Mandarin Chinese (Liu & Pell, 2012), and musical excerpts (Vieillard, Peretz, Khalfa, Gagnon, & Bouchard, 2008).
Standardized emotionally charged verbal stimulus materials are also available for several languages. Ratings according to dimensional- and/or discrete-category theories have been collected for languages such as English (the Affective Norms for English Words [ANEW]; Bradley & Lang, 1999; Stevenson, Mikels, & James, 2007), German (Berlin Affective Word List [DENN–BAWL]; Briesemeister et al., 2011; Võ et al., 2009), Finnish (Eilola & Havelka, 2010), Spanish (Redondo, Fraga, Padrón, & Comesaña, 2007), and French (Bonin, Méot, Aubert, Niedenthal, & Capelle-Toczek, 2003).
Several databases of static emotional faces have also been developed, which consist of pictures of models or actors from various backgrounds. These databases include the following: the Karolinska Directed Emotional Faces (KDEF; Lundqvist, Flykt, & Öhman, 1998), using Caucasian models; the Japanese and Caucasian Facial Expressions of Emotion (JACFEE; Ekman & Matsumoto, 1993–2004), with Caucasian and Japanese models; the Montreal Set of Facial Displays of Emotion (Beaupre, Cheung, & Hess, 2000), incorporating French Canadian, Chinese, and sub-Saharan African models; and finally, the NimStim (Tottenham et al., 2009), which provides a uniform set of Asian-American, African-American, European-American, and Latino-American actors, all photographed under identical conditions.
Finally, at present three databases contain static visual affective stimuli with various content and validated normative ratings1: the International Affective Picture System (IAPS; Lang, Bradley, & Cuthbert, 1999), the Geneva Affective Picture Database (GAPED; Dan-Glauser & Scherer, 2011), and the Emotional Picture System (EmoPicS; Wessa et al., 2010).
IAPS is the most widely used database of natural pictures of emotionally charged stimuli. Numerous cross-validation studies have shown the reliable induction of expressive and physiological emotion responses by these stimuli (Greenwald, Cook, & Lang, 1989; Lang et al., 1993; Modinos et al., 2012; Weinberg & Hajcak, 2010). The original norms and their updates (Lang, Bradley, & Cuthbert, 2008) were created according to a dimensional-category theory of affect, including valence, arousal, and dominance. The data set has also been characterized to some extent using a discrete-category theory of emotion (Davis, Rahman, Smith, & Burns, 1995; Mikels et al., 2005). Hundreds of behavioral and neuroimaging studies have been conducted using IAPS. However, as has been pointed out, certain issues relate to the use of this database (Colden, Bruder, & Manstead, 2008; Dan-Glauser & Scherer, 2011; Grabowska et al., 2011; Mikels et al., 2005). One constraint is the limited number of pictures belonging to specific content categories. This might lead to situations in which participants are presented with the same materials twice: for example, when the participants must be recruited from limited or specific cohorts (especially in the case of fMRI studies). As a consequence, the power of the emotional induction might be lowered. Similarly, if one wants to study reactions to “new” emotionally charged stimuli and their influence on cognitive processes (e.g., to study the old–new effect, repetition effect, or false recognition) (Marchewka, Jednoróg, Nowicka, Brechmann, & Grabowska, 2009; Michałowski, Pané-Farré, Löw, Weymar, & Hamm, 2011; Rozenkrants, Olofsson, & Polich, 2008), the number of images in a particular category should be large enough to avoid uncontrolled stimulus repetition. Moreover, the quality of IAPS images is not always satisfactory, which might introduce uncontrolled factors in the experimental design. This is especially the case when photographs in one category have significantly poorer quality than those in others. Several studies have also shown that the physical properties of the image, such as size, luminance, and complexity, might influence the affective processing of visual stimuli (Bradley, Hamby, Löw, & Lang, 2007; Codispoti & De Cesarei, 2007; Nordström & Wiens, 2012; Olofsson, Nordin, Sequeira, & Polich, 2008; Wiens, Sand, & Olofsson, 2011). Last, but not least, it has been shown that content category matters. For example, social versus nonsocial photographs elicit different behavioral and neural responses (Colden et al., 2008; Wiens et al., 2011).
The GAPED (Dan-Glauser & Scherer, 2011) database was recently introduced to increase the availability of visual emotion stimuli, and it can be divided into six categories. Negative pictures are divided into four specific content categories: spiders, snakes, and scenes that induce emotions related to the violation of moral or legal norms (human rights violations or animal mistreatment). Positive pictures represent mainly human and animal babies and nature scenery, whereas neutral pictures mainly depict inanimate objects. GAPED can be particularly useful in studies evaluating phobic reactions (Aue, Hoeppli, & Piguet, 2012) or other research dealing with spider and snake presentations or militant-related exposure, in which multiple presentations of stimuli of the same type are required. The main limitation of this database is its asymmetry—it contains many more negative than positive pictures, and the content of negative pictures is more specific. This makes it difficult to balance content across valences. Another limitation is that the pictures are relatively small—640 × 480 pixels.
The EmoPicS database (Wessa et al., 2010) was developed as a supplement to IAPS and provides an additional pool of validated emotion-inducing pictures for scientific studies. However, the database is relatively small, including a total of only 378 affective photographs with different semantic content (a variety of social situations, animals, and plants) selected from public Internet photo libraries and archives. The images have a resolution of 800 × 600 (landscape orientation only), which is significantly lower than the typical resolutions encountered in digital photography and display today (1,600 × 1,200). In addition to the dimensional category ratings of valence and arousal, the authors have also provided the physical parameters of each picture, including luminance, contrast, and color composition. This is a useful feature of the database, since the physical properties of pictures can influence early stages of visual processing (Chammat, Jouvent, Dumas, Knoblauch, & Dubal, 2011).
New database - The Nencki Affective Picture System
Taking into consideration the constantly growing number of behavioral and neuroimaging studies on emotion, we anticipate a demand for additional affective pictorial databases that provide researchers with information about the physical properties of the stimuli. In the present work, we provide high-quality emotionally charged photographs that are grouped into five general categories. For each picture, we have collected ratings of valence, arousal, and motivational direction (avoidance–approach), using the dimensional-category theory of emotions that has also been employed for previous databases (IAPS, GAPED, EmoPics). Additional physical qualities of the pictures, including luminance, contrast, and color composition are also provided. This newly developed set of photographs, called the Nencki Affective Picture System (NAPS), is available to the scientific community for noncommercial use.
A total of 204 healthy volunteers took part in the study (119 women, 85 men; mean age = 23.9 years, SD = 3.4). The participants were mainly college students and young employees recruited from the University of Warsaw and the Nencki Institute of Experimental Biology. Sixty percent of the participants were Polish (N = 123), the the rest belonged to other, mostly European nationalities (exchange students). The local Research Ethics Committee in Warsaw approved the experimental protocol of the study, and written informed consent was obtained from all participants prior to the study.
Rating scales and stimuli presentation
Before the experimental session, the participants were given details about the contents of the images and familiarized themselves with the dimensions through the use of example stimuli. In addition, participants were informed that if they felt any discomfort during the session, they should report it immediately in order to stop the experiment.
Each participant was presented with 362 images chosen pseudorandomly from all of the categories, with the constraint that no more than three stimuli of one category were presented in succession. In all, 12 different sets of stimuli were prepared on the basis of this rule. On average, 55 ratings were collected for each picture. The sessions started with an instructional screen and 12 practice trials, with a longer time limit for the first seven of these trials. In the main experiment, each picture was presented in full-screen view for 3 s. After the first presentation of each stimulus, rating scales were displayed on a new screen to the right, and a smaller version of the image was presented on the left side of the screen. The small picture and rating scales remained available to the participant until she or he had completed all three ratings. The participants had 3 s to complete ratings on each dimension, amounting to 9 s in total. After the participant had completed all ratings, the offset picture and scale disappeared and were immediately replaced by the next picture in the series.
Three continuous bipolar semantic sliding scales were shown, each ranging from 1 to 9. Participants indicated their ratings by moving a bar over a horizontal scale using a standard computer mouse. On the valence scale, participants were asked to complete the sentence, “You are judging this image as . . .” (from 1 = very negative to 9 = very positive, with 5 = neutral). Next, participants judged motivational direction by completing the sentence, “My reaction to this image is . . .” (from 1 = to avoid to 9 = to approach, with 5 = neutral). Finally, participants judged the degree of arousal elicited by pictures with the introductory sentence, “Confronted with this image, you are feeling: …” (from 1 = relaxed to 9 = aroused, with 5 = neutral/ambivalent).
We decided to use semantic bipolar scales in the present study because it has been show that the SAM arousal scale may lead to misinterpretations (Riberio, Pompéia, & Bueno, 2005). In the original technical manual of IAPS and SAM (Lang et al., 1999), the description of one of the extremes of the arousal scale uses the terms relaxed, calm, sluggish, and unaroused. However, the affective space obtained from stimuli in American (Lang et al., 1999) and Spanish (Moltó at el., 1999; Vila et al., 2001) populations showed that the standardized rating is “boomerang-shaped,” with one extreme of the arousal scale being referred to as no reaction. As a result, this extreme anchor of the scale was used only for neutral photographs, whereas the opposite extreme was used to describe both pleasant and unpleasant pictures (arousing, value = 9). On the other hand, Brazilians (Ribeiro et al., 2005) and Germans (Grühn & Scheibe, 2008) interpreted the SAM arousal scale differently, and attributed less arousal to pleasant photographs and more arousal to neutral and negative ones. Pleasant images of landscapes, flowers, and babies were rated as being relaxing and calming. This led to a more linear distribution of scores in the affective space.
The present experiment lasted approximately 1 h. An obligatory 10-min break was taken after half of the stimuli had been presented, during which participants were asked to leave the experimental room. The study was conducted on standard PC computers using 24-in. LCD monitors. The core software for stimulus presentation and data acquisition was created using Presentation software (Version 14.6, www.neurobs.com). All responses were analyzed further using the statistical package SPSS (2009).
Descriptive statistics, calculated separately for each dimension in women, men, and both groups, for all NAPS photographs
Both groups: Arousal
Both groups: Valence
Both groups: AvAp
Descriptive statistics, calculated separately for each dimension and category in women and men
Correlation analyses of emotional dimensions for gender and content categories
The importance of sex differences has been documented in cognitive processes such as memory, emotion, and vision (see Cahill, 2006, for a review). It has been shown that the same visual stimuli may elicit different levels of arousal and valence in males than in females. Relative to men, women react more strongly to unpleasant materials. They rate IAPS pictures as being more unpleasant and arousing, and react with higher corrugator electromyographic activity and greater event-related potential amplitudes (Bradley et al., 2001; Lithari et al., 2010; McManis et al., 2001). On the other hand, men tend to rate pleasant pictures, especially erotica, as more pleasant and more arousing than women do, and show significantly greater electrodermal activity. Finally, pleasant and unpleasant IAPS stimuli have been shown to activate different neuronal structures in women and men (Wrase et al., 2003).
Correlation coefficients resulting from correlations of ratings of valence, arousal, and approach–avoidance with one another, listed for each category and gender
All dimensions were highly correlated in both men and women (all ps < .001; see Table 3 for the correlation coefficients). However, women had higher correlation coefficients than did men in all cases, except for the correlations between valence and approach–avoidance for animals and landscapes. In the case of the correlations between valence and arousal, the differences between women and men were the strongest for objects (z = –5.91, p < .001, effect size = 0.46) and people (z = –3.88, p < .001, effect size = 0.35), and the weakest for landscapes (z = –2.69, p = .007, effect size = 0.28). Similarly, in case of arousal correlated with approach–avoidance, the strongest difference between genders was visible for objects (z = –6.94, p < .001, effect size = 0.54), and the weakest for landscapes (z = –2.75, p = .006, effect size = 0.29). For the correlations between valence and approach–avoidance, differences were found for objects (effect size = 0.29), people (effect size = 0.31), and faces (effect size = 0.24) (3.29 ≤ z ≤ 3.79, ps ≤ .001).
Physical properties of images
The properties of each image were computed with Python-based (www.python.org) in-house software using SciPy (Version 0.10.1, www.scipy.org) and the Python Imaging Library (for JPEG compression; Version 1.1.7). Luminance was defined as the average pixel value of the grayscaled image, and the contrast was defined as the standard deviation across all pixels of the grayscaled image (Bex & Makous, 2002). JPEG size can be used as an index of the overall complexity of an image (Donderi, 2006). Perceptually simple images are highly compressible, therefore resulting in smaller file size. The JPEG sizes of the color images were determined with a compression quality setting of 80 (on a scale from 1 to 100). As an additional index of image complexity, the entropy of each grayscaled image was determined. Entropy, H, is computed from the histogram distribution of the 8-bit gray-level intensity values x: H = –Σp(x)log p(x), where p represents the probability of an intensity value x. Entropy varies with the “randomness” of an image—low-entropy images have rather large uniform areas with limited contrast (e.g., a dark sky), whereas images with high entropy are images that are more “noisy” and have a high degree of contrast from one pixel to the next. In addition, each picture was converted to the CIE L*a*b* color space. This space, unlike RGB color space, is based on the opponent-process theory of color vision and approximates characteristics of the human visual system. In this system, the L* dimension corresponds to luminance (range: 0–100), and a* and b* correspond to two chromatic channels ranging from red (positive values) to green (negative values), and from blue (negative values) to yellow (positive values) (Tkalcic & Tasic, 2003). For each image and channel, the mean across all pixels was calculated. For example, a high positive value in the a* dimension indicates that a particular picture contains a large amount of “red color.” Values for the different physical properties for each picture are listed in Table S1 of the supplemental materials.
In order to directly compare the slider ratings obtained with the methodology applied in the present study to the ratings obtained using the SAM scale (Lang et al., 1999), two additional experiments were conducted. In these experiments, a subset of images from the NAPS (n = 48) and IAPS (n = 48) databases were chosen. First, the IAPS pictures were selected to cover the whole affective space (excluding erotic images), and then NAPS pictures were matched to them for content: landscapes, smiling faces, objects, snakes, wild animals, accidents, mutilated faces, and so forth. The full list of images is presented in the supplemental materials as Table S2.
A total of 96 images for each study were presented pseudorandomly with respect to their content and the source of each image. The images were displayed for 3 s, after which the small picture and rating scales remained available to the participant until she or he had completed all three ratings. The images from NAPS were downsampled to match the lower resolution of the images from IAPS. Two separate procedures were conducted on different groups of volunteers. In the first study, 14 participants (eight women, six men; mean age = 23.5 years, SD = 1.4) underwent the procedure already described with slider scales, but only with the valence and arousal scales. In the second study, 14 participants (eight women, six men; mean age = 23.7 years, SD = 1.2) rated the stimuli using a computerized version of the SAM, for the valence and arousal scales only. A power analysis indicated that a sample size of 13 was sufficient to detect a significant correlation (effect size = 0.7) with a power of .80 and an alpha of .05. Both studies were conducted using English instructions and scale descriptions. The participants assigned to each group were matched for age and education. They were mostly Polish and European students of the Warsaw International Studies in Psychology program.
Correlations for a subset of 48 IAPS pictures
For the NAPS pictures, we also obtained strong correlations between the ratings obtained with the slider scale and SAM, for both valence (r = .962, p < .001) and arousal (r = .745, p < .001). Again, the correlation between valence and arousal was slightly, but not significantly, higher for the ratings obtained using the slider scale (r = –.747, p < .001) than for those with the SAM (r = –.649, p < .001).
In the present study, we have presented a comprehensive battery of static, emotionally charged and emotionally neutral visual stimuli that are available for use by the scientific community. Following empirical suggestions to divide emotionally charged stimuli into meaningful content categories (Weinberg & Hajcak, 2010), the database provides images for five content categories—people, faces, animals, objects, and landscapes—which distinguishes it from other databases. The database should facilitate examination of the influence of context on emotional elicitation by providing researchers with some control over stimuli. All pictures belonging to each category are of high quality, with a minimum resolution of 1,600 by 1,200 pixels.
We examined the influence of gender on the correlations between valence, arousal, and approach–avoidance for different categories of pictures. Taking into account the differences in the correlation coefficients between genders, together with the scatterplots, correlations between arousal and valence were stronger in women than in men, especially for pictures of people, faces, and objects. This finding might reflect a rating bias mentioned by Bradley and Lang (2007) for the IAPS pictures, which was particularly evident for unpleasant pictures (Schaaff, 2008). It seems that women, more than men, are biased for pictures depicting humans and objects. In other words, they tend to primarily rate unpleasant pictures as more arousing. In line with electrophysiological studies, women show greater event-related potential amplitudes for unpleasant and highly arousing stimuli than do men (Lithari et al., 2010).
In the present study, we used a computerized slider that was moved along a gradually colored bar to obtain a 9-point rating scale for several dimensions. To validate our methodology, we conducted two additional experiments using subsets of images from NAPS and IAPS to compare the ratings obtained with the slider scale to ratings obtained with a computerized version of the SAM. For both the valence and arousal dimensions, we found strong correlations between the ratings collected using the slider scale and SAM. These correlations were significant for the SAM ratings gathered in the present sample, as well as for previously obtained norms (Lang et al., 2008; Libkuman et al., 2007). In contrast to the SAM, the semantic slider scale produced a more linear relationship between valence and arousal. This finding was most probably due to the fact that the arousal scale was more bipolar in the case of the slider scale (going from relaxed to aroused) than of the SAM (from unaroused to aroused: Ribeiro, Teixeira-Silva, Pompéia, & Bueno, 2007). This might explain why the affective space of NAPS does not have the boomerang shape seen in the affective space of IAPS. Alternatively, adding highly arousing, pleasant (erotic) images to NAPS might change the distribution of ratings in the affective space. Researchers using NAPS should be aware of the fact that this procedure, as is true of the GAPED rating method (Dan-Glauser & Scherer, 2011), may potentially give rating values different from those on the SAM scale. Therefore, we suggest that researchers conduct separate ratings based on the SAM for studying specific sets of images from NAPS if they want to directly compare their results to those obtained using IAPS. We also advise researchers to employ an additional rating procedure when pulling together images from different visual affective static image sets (GAPED, EmoPicS, or IAPS) or when controlling the physical values of the images for studies using electroencephalography (Wiens et al., 2011).
An additional limitation of NAPS is that the present version of the database lacks very positive (high-valence) pictures with high arousal (e.g., pictures with erotic content). However, we are in the process of adding these images to the database. Further analysis aimed at segregating basic emotions (Mikels et al., 2005) within NAPS is also in progress. Images without standardized ratings can also be provided to researchers on request. The database, together with the dimensional ratings and physical properties of each image, is available to the scientific community, for noncommercial use only, on request.
Note that additional stimulus databases are available to the scientific community. Here, we focus on those that have most often been employed and that have been validated according to the emotional theories. For more information on neutral and emotionally charged stimuli, see the website www.cla.temple.edu/cnl/STIMULI/index.html.
We are grateful to Anna Jaworek for helpful comments on the construction of NAPS and results interpretation. We are also grateful to Katarzyna Paluch, Małgorzata Wierzba, and Łukasz Okruszek for participant recruitment. This study was supported by Polish Ministry of Science and Higher Education, the Iuventus Plus Grant Nos. IP2010 024070 and IP2011 033471. The authors have declared that no competing interests exist.
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.