Introduction

The International Affective Picture System (IAPS; Bradley & Lang, 1994) is a set of images used in psychological research to elicit emotions. The IAPS developers suggest that it can be used cross-culturally, and the instrument’s psychometric properties have been verified in many different settings, including in high-income countries (e.g., France, Germany, Spain; Barke et al., 2012; Bungener et al., 2016; Grühn & Scheibe, 2008; Moltó et al., 1999, 2013) and in low- and middle-income countries (LMICs; e.g., Brazil, India, Iran, Mexico; Gantiva et al., 2019; Lasaitis et al., 2008; Lohani et al., 2013; Romo-González et al., 2018; Vélez-García et al., 2003; Zamani, 2017).

Nonetheless, several studies recommend using country-specific normative standards when interpreting IAPS ratings of individuals residing outside of the United States, where the instrument was developed and where the original normative datasets were collected (Barke et al., 2012; Deák et al., 2010; Soares et al., 2015). However, almost all the extant studies, including the original study describing the instrument’s development and standardization, have used samples consisting exclusively of university students, and none consider the influence of variations in education level and socioeconomic status (SES) on IAPS ratings. In this study, we modified the IAPS for use in a representative LMIC (South Africa) and sampled individuals from a range of socioeconomic and educational backgrounds to develop norms based on both the original and a modified IAPS.

The IAPS is a pictorial database comprising 1182 high-resolution pictures, each of which depicts some aspect of the human experience. It is commonly used to assess a two-dimensional model of emotion, focused on the components of valence and arousal (Bradley & Lang, 2007; Drače et al., 2013; Dufey et al., 2011; Lang et al., 1997). Valence denotes the evaluation of a stimulus and can range from pleasure (e.g., happiness, contentment, amusement) to displeasure (e.g., worry, revulsion, sadness). Arousal denotes the degree of activation in response to a stimulus, and ranges from calm, dull, or sleepy, to jittery, fearful, or excited. Each picture in the database has a standardized valence and arousal rating, based on normative data collected by the developers (Lang et al., 2008).

The IAPS has been applied in diverse contexts to investigate emotional processing. Generally, cross-cultural studies report highly significant correlations between local participants’ mean valence picture ratings and corresponding US normative standards (see, e.g., Deák et al., 2010; Drače et al., 2013; Ribeiro et al., 2005), suggesting congruence between participant responses and normative standards for valence. However, many cross-cultural studies find that arousal ratings of local participants are significantly higher than US normative standards (see, e.g., Dufey et al., 2011; Lasaitis et al., 2008). These differences have been attributed to less universality in the concept of arousal (Lohani et al., 2013). Whereas valence as a construct is relatively easy to explain and to depict pictorially, arousal is more opaque. Furthermore, cultural modulation of arousal (i.e., the extent to which calmness or excitability is expressed) is stronger than that of valence. For example, Swedish participants had consistently lower arousal ratings than Italian participants for specific sets of IAPs images (Bradley & Lang, 2007). These fluctuations tend to complement cultural stereotypes of these countries: Swedish individuals are typecast as calmer whereas Italian individuals are typecast as more emotional and excitable.

Studies testing the cross-cultural application of the IAPS have, for the most part, been conducted using samples of university students (see, e.g., Deák et al., 2010; Drače et al., 2013; Dufey et al., 2011; Silva, 2011). These studies, although emerging from different countries, all sample from a niche sector of the population that is characterized by a homogenously high level of education. In LMICs, those samples are not representative of a country’s general population, an issue that is particularly pertinent because there is within-country variability on measures of emotion (Jakoby, 2016).

Furthermore, university students typically fall into a higher-income range than the general population. Studies show that both education and SES influence emotion expression and recognition (Côté et al., 2010). For example, Silverman et al. (2009) found that, in response to positively valenced stimuli and compared to those with a higher SES, low-SES individuals had significantly reduced neuronal activation in several brain regions associated with emotion processing. Therefore, homogeneity in these variables is a limitation of this literature and could account for the high similarities and small differences observed in all norming studies.

The current study

Although published literature suggests that the IAPS is reliable and valid in diverse contexts, these studies have (a) shown that arousal ratings are not equivalent in different settings and differ from the original US normative standards, (b) primarily recruited students only, and (c) not considered variability in education and SES in their normative data. Furthermore, no IAPS norming studies have been conducted in Africa, even though 17% of the world’s population resides on that continent (Ritchie, 2019).

The current study had three broad aims. First, we aimed to test the cross-cultural applicability of the IAPS by administering it to a diverse sample of South African adults. Second, we aimed to develop a version of the IAPS suited for administration to South Africans of higher and lower SES and education. We envisioned this version to be easily compared with the original version (and therefore comparable with a broad range of international studies). Finally, we aimed to investigate the effects of education and SES on image valence and arousal ratings.

To achieve these aims we recruited (a) university students as well as participants with lower levels of education, and (b) higher- and lower-SES participants. Variables of education and SES are especially important in South Africa because this country has one of the most unequal education and SES distributions in the world (Byanyima, 2014; Spaull, 2013).

Using this sample, we tested these specific hypotheses: (1) the original IAPS stimuli will evoke valence and arousal scores significantly different from US normative standards; (2) the modified IAPS stimuli will evoke valence and arousal scores similar to US normative ratings for the original IAPS; and (3) participants with higher levels of SES and education level will have IAPS valence and arousal ratings that are (a) significantly different from those with lower levels of SES and education, and (b) significantly more similar to US normative ratings. After testing this third hypothesis, we developed a set of regression-based norms for the original IAPS and for its modified version that captured the relative contributions of sociodemographic variables to valence and arousal ratings.

Method

Participants

Power analysis

G*Power software (Faul et al., 2009) suggested that, with basic parameters set at α = .05 and effect size (Cohen’s d) = 0.5, our sample of 150 participants was adequate to achieve statistical power of > .90 for all our analyses.

Recruitment

We used purposive and snowball sampling to recruit adult volunteers (aged 18–30 years) from university campuses and surrounding communities, informal settlements, and medium-sized places of employment in two major South African urban centres. As Fig. 1 shows, we screened 977 individuals in order to arrive at a final sample number of 150. The number of individuals screened far exceeded those who provided data for the final sample because we (a) strictly applied the below-listed eligibility criteria; (b) required equal numbers of men and women; and (c) only included individuals with both high SES and > 12 years of completed education, or with both low SES and ≤ 12 years of completed education (i.e., we excluded, for instance, all individuals with low SES and > 12 years of educationFootnote 1); and (d) tried to ensure, given considerations around the racial representation within picture sets and possible own-race bias in facial recognition (Chiroro et al., 2008; Zhang et al., 2015), that low- and high-SES Black African, ColouredFootnote 2 (mixed ancestry), and White individuals were represented equally.

Fig. 1
figure 1

Recruitment, attrition, and final sample structure. Note. All participants classed as ‘higher SES and education’ had completed at least some tertiary education. All participants classed as ‘lower SES and education’ had completed 12 or fewer years of education. Regarding race, all participants self-identified as one of these three groups, as defined by the most recent South African Population Census (Statistics South Africa, 2016). We classified individuals as being of a low SES if they reported (a) a monthly income within the range ZAR 0–14,999; (b) ≤ 12 years of completed formal education for both parents; and (c) their head of household as having a Hollingshead (1975) occupational status of ≤ level 3 (i.e., at or below administrative personnel/managers/owners or sole proprietors of small businesses). We classified individuals as being of a high-SES if they reported (a) a monthly income ≥ ZAR 31,300; (b) > 12 years of completed formal education for both parents; (c) their head of household as having a Hollingshead (1975) occupational status of > level 3 (i.e., at or better than clerical and sales, technicians); and (d) employing a domestic worker or owning a car

Eligibility criteria

We sought to include equal numbers of men and women because there are significant sex differences in patterns of affective valence and arousal, with men tending to rate pictures as more pleasant and women tending to rate pictures as more arousing, with the exception of those depicting erotica, which men tend to experience as being more arousing (Bradley & Lang, 2007; Silva, 2011; Soares et al., 2015; Wierzba et al., 2015). We limited our age range to 18–30 years so that our data would be comparable to those from previous IAPS normative studies (e.g., Bungener et al., 2016; Grühn & Scheibe, 2008).

Individuals who self-reported clinically significant symptoms of major depressive disorder (MDD), posttraumatic stress disorder (PTSD), and/or a substance use disorder (SUD) were excluded. These psychiatric disorders are associated with biases in emotional reactivity and emotion processing (McTeague et al., 2020). We also excluded those currently prescribed psychotropic medication.

Materials and procedure

Developing the South African Affective Picture System (SA-APS)

We first sorted the entire IAPS database by valence and then, separately, by arousal, using the published normative standards (Bradley & Lang, 2007). Pictures with valence ratings between 1 and 3.99 were classified as unpleasant, between 4 and 5.99 as neutral, and between 6 and 9 as pleasant. Pictures with arousal ratings between 1 and 3.99 were classified as calm, between 4 and 5.99 as neutral, and between 6 and 9 as aroused. We then used an online random number generator to select 68 pictures in each of these categories: pleasant-aroused, pleasant-calm, neutral, unpleasant-calm, and unpleasant-aroused. (See Supplemental Material for these 340 pictures.)

Next, we replaced some IAPS pictures with images more appropriate to the South African context. Replacement decisions were made based on two primary factors: in-group biases with regard to facial recognition (Meissner & Brigham, 2001; Zhang et al., 2015) and cultural biases with regard to objects and scenes. More than half of IAPS stimuli include human subjects, and facial expressions of these subjects serve as important cues in emotion elicitation. Furthermore, many stimuli present events or occurrences (e.g., skiing) that are encountered infrequently by most South Africans. After this evaluation process, we replaced 252 IAPS pictures while retaining 88 whose content was likely to be understood in a similar manner cross-culturally.

We then conducted a pilot study where we asked 29 low-SES South African adults to rate the original 340 IAPS pictures and the 252 replacement pictures. We recruited a low-SES sample for the pilot study for two main reasons. First, because the IAPS normative sample was recruited from a relatively high-SES population in the global north, low-SES South Africans are likely to have ratings different from that standardization group. Second, because of starkly unequal distributions in wealth in South Africa, statistically most individuals are likely to be classified as being of low SES. To assess whether the images evoked the same emotions in South African participants as in the original standardization sample, we compared our pilot sample’s valence and arousal ratings of each picture to the US normative ratings. Results suggested that 41 of the 340 IAPS pictures and 33 of the 252 replacement pictures were problematic (i.e., South African ratings were statistically significantly different from US normative ratings). Of the 41 problematic IAPS pictures, 32 had already been tabbed for replacement during the previous stage in our development process. Furthermore, the pilot sample’s ratings for those 32 replacements were not significantly different from those of the equivalent pictures in the US normative database, and hence were not considered problematic.

Finally, we ran a six-person focus group (two men and four women, all South African and aged between 18 and 30 years; three had not completed high school and three had completed at least some university courses) to discuss the problematic pictures. After viewing each picture, the group discussed why South African ratings of that picture might have been different from US ratings, and what kind of replacement picture might deliver ratings more similar to the US normative standards.

At the conclusion of this process, we had developed a set of 340 pictures that constituted the SA-APS. Of those pictures, 79 had been retained from our original IAPS subset and 261 were ‘new’ pictures. Original and replacement pictures (which were sourced from a Google search of unpatented images) were matched by visual inspection in terms of composition (i.e., close-up, medium, or long shot), colour (bright versus dull colours), and activity (activities in which human or animal figures were involved).

Main study protocol

The protocol comprised a 10-minute screening phase and a 2-hour testing phase. All participants provided informed consent for both phases. We obtained ethical approval from our institution’s ethics committee, and our procedures complied with the Declaration of Helsinki (World Medical Association, 2004).

  • Screening. Individuals completed the following questionnaires, which determined their eligibility for the testing phase, either electronically (documents were exchanged via email) or in hard copy (a researcher administered the documents).

We used these standardized measures because of their brevity, strong psychometric properties, and history of use in LMICs. The Patient Health Questionnaire for Depression-9 (PHQ-9; Kroenke et al., 2003) assessed for the presence and severity of depressive symptoms over the 2 weeks prior to reporting. We excluded individuals with PHQ-9 scores ≥ 10 (an indication of possible MDD) from participation. The four-item Primary Care Post-Traumatic Stress Disorder Screen (PC-PTSD; Prins et al., 2003) enquires about lifetime trauma that has, over the month prior to reporting, directly affected aspects of the respondent’s daily functioning. We excluded individuals who answered ‘yes’ to ≥ 3 of the questions. The three-item Alcohol Use Disorders Identification Test Consumption (AUDIT-C; Bush et al., 1998) is designed to detect heavy drinking, active alcohol abuse, and/or alcohol dependence. Scores range from 0 to 12, with higher scores indicating greater risk behaviour. We excluded men who scored ≥ 7 and women who scored ≥ 5 (DeMartini & Carey, 2012). The 10-item Drug Abuse Screening Test (DAST-10; Skinner, 1982) is designed to establish severity of drug dependence. We excluded individuals with at least a moderate (score ≥ 3) level of dependence. Finally, a study-specific self-report sociodemographic questionnaire gathered data regarding age, sex, home language, English proficiency, highest level of education, and SES.

Those eligible for further participation were invited to participate in a testing session. Those who screened positive for psychiatric disorders, and who could be contacted, were referred to an appropriate counselling service.

  • Testing phase. We used E-Prime software (Psychology Software Tools, 2017) to administer the protocol. Each participant completed their test session of approximately 2 hours as part of a group, with group sizes ranging from 3 to 16. Overall, we ran 58 separate test sessions in venues ranging from university laboratories to adult education centres and private residences. Hence, some participants viewed the images on standard laboratory desktop computers (screen size = 18.5″) whereas others viewed them on research laptops (screen size = 15.6″). Small differences in screen size do not appear to have a significant effect on emotion elicitation protocols (Blok, 2021).

After being seated in front of a computer screen, the participant read a set of instructions adapted from Lang et al.’s (2008) technical manual (see Supplemental Material). They then completed a practice trial that required them to rate one positive, one negative, and one neutral IAPS image using the Self-Assessment Manikin (SAM; Bradley & Lang, 1994), a picture-oriented nine-point Likert-type scale used primarily as a self-report measure of valence and arousal. Hence, this practice trial sought to ensure that participants understood the valence and arousal dimensions they would be rating.

They then rated the complete IAPS and SA-APS sets, totalling 601 pictures. Each picture was presented for 6 seconds before participants were prompted to provide valence and arousal ratings using the SAM. Ratings of both valence and arousal were programmed as forced-choice responses. Presentation of the IAPS and SA-APS was counterbalanced so that half the participants viewed the IAPS first, while the other half viewed the SA-APS first. Pictures in each set were presented in random order to control for ordering effects. We gave each participant a 5-minute break and refreshment between presentation of the two picture sets. At the end of the session, participants were debriefed and compensated either via course credit or a shopping voucher valued at approximately US$ 7.

Statistical analyses

We used SPSS software (version 26). The threshold for statistical significance was set at α = .05, with Bonferroni corrections made where appropriate. We interpreted effect size following convention (i.e., Cohen’s d = 0.2 was considered a small effect size, 0.5 medium, and 0.8 large; Cohen, 1988).

The analyses proceeded as follows: First, we calculated internal consistency reliability (estimated using Cronbach’s alpha, Spearman-Brown, and split-half correlation coefficients) for the current IAPS and SA-APS valence and arousal ratings.

Second, using data from the IAPS manual (Lang et al., 2008), we calculated mean valence and arousal values for each IAPS image, based on ratings made using the nine-point SAM scale. For valence, all pictures with a mean rating < 4.00 were classed as negative, all with a mean rating > 5.99 were classed as positive, and the rest were classed as neutral. For arousal, all pictures with a mean rating < 4.00 were classed as low-arousal, all with a mean rating > 5.99 were classed as high-arousal, and the rest were classed as neutral.

Third, we tested Hypothesis 1 using (a) six one-sample t-tests that compared South African participants’ valence and arousal ratings for the subset of 340 IAPS pictures to the normative data from the US standardization sample for those pictures, and (b) a Pearson product–moment correlation that assessed the magnitude of the relationship between the current sample’s valence and arousal ratings of the IAPS pictures. This relationship is known as the affective space elicited by pictures. It should take on a boomerang-shaped distribution if functioning according to US normative standards (Bradley & Lang, 2007).

Fourth, we tested Hypothesis 2 using (a) six one-sample t-tests that compared South African participants’ valence and arousal ratings for the SA-APS set of 340 pictures to the normative data (for each IAPS picture that remained part of the SA-APS, and for each IAPS picture that was the equivalent of a ‘new’ SA-APS picture) from the US standardization sample, and (b) a Pearson product–moment correlation that assessed the magnitude of the relationship between the current sample’s valence and arousal ratings of the SA-APS pictures.

Fifth, we tested Hypothesis 3 using six independent-sample t-tests that compared the IAPS valence and arousal ratings of the higher-SES/education group versus the lower-SES/education group. We then converted each participant’s average positive, negative, and neutral valence scores and high, low, and neutral arousal scores to z-scores using the formula (participant’s score − US normative mean) / SD of differences. Hence, each z-score captures a magnitude of difference from the US normative standard. We then averaged participants’ individual z-scores to determine, for each picture category, the overall z-score for each of the higher-SES/education and lower-SES/education groups. Scores closer to 0 reflect a closer match with US normative ratings. We determined the affective space for each SES/education group using two separate Pearson product–moment correlations.

Sixth, we created regression-based normative data for the valence and arousal ratings of the IAPS and SA-APS pictures. We took a regression-based approach, rather than a more traditional approach of stratifying by each relevant factor, because it allowed us to avoid diminishing the sample size to account for each predictor while still being able to adjust for multiple covariates. Separate linear backward regression models determined the relative contributions of sex (coded as a categorical variable with male = 0, female = 1), education level (measured continuously, with a range of 0–16), and SES (measured continuously, with a range of 1–12) to each of 12 IAPS and SA-APS outcome variables: positive, negative, and neutral valence; and high, low, and neutral arousal. Because demographic information was missing for 10 participants, the normative data were based on 140 participants’ picture ratings.

Results

Sample characteristics

Participant age ranged from 18 to 30 years (M = 21.6 ± 2.85). All could read and understand English proficiently. Those assigned to the low-SES/education group (n = 70) had an average of 10 years of education (range = 0–12) and a total household monthly income range of ZAR 3000–6299. On average, their parents had completed 10 years of education, and modal parental employment fell within the ‘semi-skilled’ range (i.e., Hollingshead level 6). Those assigned to the high-SES/education group (n = 80) had an average of 13 years of education (range = 13–16) and a total household monthly income range of ZAR 35,500–39,499. Their parents’ modal highest level of education was in the tertiary education range (bachelor’s degree received), and modal parental employment fell within the ‘business managers of medium-sized businesses’ range (i.e., Hollingshead level 2).

All participants scored below the clinical cut-offs for measures of depression (PHQ-9: M = 4.6 ± 2.6, range = 0–9), PTSD (PC-PTSD: M = 0.7 ± 0.8, range = 0–2), alcohol use disorder (AUDIT-C: M = 2.2 ± 2.0, range = 0–7), and drug dependence (DAST-10: M = 1.4 ± −0.7, range = 1–3). No participant had taken psychotropic medication in the 12 months prior to reporting.

Psychometric properties

Reliability coefficients indicated that valence and arousal ratings for the set of 340 IAPS pictures, and for the set of 340 SA-APS pictures, were highly internally consistent (see Table 1).

Table 1 Internal consistency of the IAPS and SA-APS ratings in the current sample (N = 150)

Testing Hypothesis 1

Analyses detected three significant differences between the sample’s ratings of the current set of IAPS pictures and the ratings of those pictures by the US standardization sample. For positively valenced and high-arousal pictures, the US ratings were significantly higher than the South African ratings, whereas for low-arousal pictures the opposite held (see Table 2).

Table 2 Comparison of South African participants’ IAPS ratings to US normative standards (N = 150)

Overall, the IAPS ratings of South African participants were much closer to US normative standards in terms of valence than they were in terms of arousal. This impression is confirmed by examining the percentage of pictures within each category for which there was agreement between the current ratings and the normative ratings (i.e., the proportion for which both sets of ratings agreed that the picture belonged in that particular category): positive valence = 81% (109/135); negative valence = 80% (121/136); neutral valence = 86% (59/69); high arousal = 29% (17/59); low arousal = 19% (16/83); and neutral arousal = 29% (85/198).

Regarding the affective space of these IAPS stimuli given the data from the current sample, analyses detected a significant negative correlation between valence and arousal ratings, r = −.32, p < .001 (see Fig. 2, panel A). The direction and magnitude of that relationship between valence and arousal matched those of the US normative sample: There was a negativity bias, and the effect was medium-sized (US r = −.36, p < .01). Hence, the boomerang shape of the distribution in the US normative sample (see Fig. 2, panel C) was largely preserved in the South African sample.

Fig. 2
figure 2

Relationship between valence and arousal in (a) South African participants’ ratings of the IAPS picture set, (b) South African participants’ ratings of the SA-APS picture set, and (C) US normative standards for the IAPS picture set

Testing Hypothesis 2

Initial analyses detected one significant difference between the sample’s ratings of the SA-APS pictures and the ratings of the corresponding IAPS pictures by the US standardization sample: For pictures categorized as having low arousal value, South African ratings were significantly higher than the US ratings (see Table 3). Of note, however, is that this significant difference did not survive the Bonferroni correction.

Table 3 Comparison of South African participants’ SA-APS ratings with US normative standards for corresponding IAPS pictures (N = 150)

Again, the picture ratings of South African participants were much closer to US normative standards in terms of valence than they were in terms of arousal. This time, the proportion for which both sets of ratings agreed that the picture belonged in a particular category were as follows: positive valence = 85% (115/135); negative valence = 90% (122/136); neutral valence = 80% (55/69); high arousal = 27% (16/59); low arousal = 20% (17/83); and neutral arousal = 41% (81/198).

Regarding the affective space of the SA-APS stimuli for the current sample, analyses detected a significant negative correlation between valence and arousal ratings, r = −.32, p < .001 (see Fig. 2, panel B). Again, the direction and magnitude of that relationship matched those of the US normative sample: there was a negativity bias, and the effect was medium-sized. Hence, the boomerang shape that characterizes the distribution of correlations between valence and arousal ratings of IAPS pictures in the US normative sample (see Fig. 2, panel C) was largely preserved in this South African sample’s ratings of the new SA-APS picture set.

Testing Hypothesis 3

As Table 4 shows, analyses detected three significant between-group (higher SES/education versus lower SES/education) differences for the IAPS picture ratings. Specifically, participants in the latter group rated positively valenced pictures as being significantly more pleasant but rated both high- and low-arousal pictures as being significantly more neutral. These between-group differences all survived the Bonferroni correction, and were associated with moderate-to-large effect sizes.

Table 4 Between-group comparison: IAPS picture ratings of higher-SES/education versus lower-SES/education participants (N = 150)

Figure 3 corroborates the results presented in Table 4. Panel A shows an almost perfect linear distribution, suggesting that valence ratings were relatively similar across groups. In contrast, panel B shows reasonable variability along the y-axis (arousal ratings of higher-SES/education participants) but a lack of such variability along the x-axis (arousal ratings of lower-SES/education participants).

Fig. 3
figure 3

Relationship between (1) valence ratings of participants with higher SES/education and those with lower SES/education (panel A), and (2) arousal ratings of participants with higher SES/education and those with lower SES/education (panel B)

Table 5 presents the category-relevant z-score data and one-sample t-test results, relative to the US normative standards, for the two groups. Ratings of participants with higher SES/education were significantly different from the US norms in three categories (positive valence, neutral valence, high arousal). Ratings of participants with lower SES/education were significantly different from the US norms in two categories (high arousal and low arousal). Data patterns were quite different across groups. Specifically, all ratings in the lower-SES/education group were much more variable (as indicated by larger SD values), and their arousal ratings tended much more toward a mid-range (i.e., on average, they rated high-arousal pictures as significantly less arousing than the US normative standard and low-arousal pictures as significantly more arousing than that standard). In contrast, participants in the higher-SES/education group tended to rate all pictures as being more arousing than the US normative standard (significantly more so in the case of high-arousal pictures). Moreover, with regard to valence, participants in the higher-SES/education group provided ratings that were significantly different from the US normative standard (broadly speaking, these tended to be less positive than those of the standardization sample), whereas those in the lower-SES/education group provided ratings more similar to the normative standards.

Table 5 IAPS ratings of South African participants: Descriptive statistics and comparisons to US normative standards (N = 150)

Regarding the affective space of the IAPS stimuli for the two groups, analyses detected a significant positive correlation between valence and arousal ratings for the lower-SES/education group, r = .69, p < .001, and a significant negative correlation between valence and arousal ratings for the higher-SES/education group, r = −.72, p < .001. As Fig. 4 shows, plotting these mean ratings against one another resulted in boomerang-shaped distributions. However, the shape of the two boomerangs was skewed differently. Whereas participants in the lower-SES/education group tended to rate positively valenced pictures as having fairly high arousal value but negatively valenced pictures as more moderately arousing, participants in the higher-SES/education group tended to provide ratings more similar to the US normative sample.

Fig. 4
figure 4

Relationship between mean valence and arousal ratings for (a) participants in the lower-SES/education group (n = 70) and (b) participants in the higher-SES/education group (n = 80) on the IAPS picture set

Regression-based normative data for the IAPS and SA-APS images

Regression modelling indicated that only sex and SES were significant predictors of the current sample’s IAPS and SA-APS ratings (see Table 6). Regarding the effects of SES on IAPS ratings, participants with higher levels of SES tended to give significantly lower ratings of positively valenced and low-arousal images, and significantly higher ratings of high-arousal images. Regarding the effects of sex on IAPS ratings, women tended to give significantly lower ratings of negatively valenced images and significantly higher ratings of high- and neutral-arousal images.

Table 6 Stepwise regression analyses: Predictors of IAPS and SA-APS valence and arousal ratings (N = 140)

Regarding the effects of SES on SA-APS ratings, participants with higher levels of SES tended to give significantly lower ratings of low-arousal images and significantly higher ratings of high-arousal images. Regarding the effects of sex on SA-APS ratings, women tended to give significantly lower ratings of positively valenced images and significantly higher ratings of high-arousal images.

Table 7 presents the consequent regression equations that can serve to generate preliminary normative data.

Table 7 Regression-based normative data: IAPS and SA-APS picture sets (N = 140)

*p < .05. **p < .01. ***p < .001

Discussion

The current study aimed to demonstrate that the International Affective Picture System (IAPS; Bradley & Lang, 2007) should be modified before being used in low- and middle-income countries (LMICs; specifically, South Africa), and that sociodemographic variables (in particular, variations in socioeconomic status [SES]) must be considered when interpreting IAPS valence and arousal ratings. Our first step was to evaluate a set of 340 IAPS images and to replace those that (a) depicted scenes that would be relatively unfamiliar to most South African adults and/or (b) included only white people. After that evaluation and replacement, we had created a set of pictures (termed the South African Affective Picture System [SA-APS]) that retained 79 of the original IAPS images and included 261 ‘new’ images that were deemed appropriate to the South African context.

In our sample of 150 healthy South African adults, the internal consistency and split-half reliability of the IAPS and SA-APS picture sets were both similar to those of the IAPS in the US standardization sample (i.e., > .85; Lang et al., 2008). This result suggests that the strong psychometric properties of the IAPS (a) were maintained in the current LMIC context and (b) appeared to extend to the SA-APS. Our confidence in the reliability of both picture sets allowed us to proceed to the next phases of hypothesis testing.

Hence, we investigated whether the stimuli within our set of IAPS pictures and those within the SA-APS picture set elicited similar emotions in South African participants as the original IAPS stimuli elicited in the US standardization sample. Results confirmed that our sample’s ratings of SA-APS images were better aligned with the US normative standards than their ratings of IAPS images. Specifically, significant differences from those norms were present for only one category of SA-APS pictures (low arousal) but for three categories of IAPS pictures (positive valence, high arousal, low arousal).

Moreover, we observed that (a) the current sample’s ratings (for both IAPS and SA-APS images) were closer to the US normative standards for valence than for arousal, and (b) South African participants tended to deliver more conservative ratings than the US normative sample. Specifically, they rated positively valenced IAPS pictures as being less pleasant, high-arousal IAPS pictures as being less arousing, and low-arousal IAPS and SA-APS pictures as being more arousing. In other words, their ratings tended more toward the middle of the scale than toward the extremes.

Our evaluation of the sample’s affective space (i.e., correlation between valence and arousal ratings) for both picture sets confirmed these impressions: Although generally speaking there was a negativity bias and a medium-sized correlation, the expected boomerang-shaped distribution was flatter at its edges in the current sample than in the standardization sample. Studies from other LMICs show similar results (see, e.g., Drače et al., 2013; Dufey et al., 2011; Lohani et al., 2013).

The results described thus far suggest that our sample rated IAPS images differently from how the US standardization sample rated them. The next step in our investigation examined whether particular sociodemographic factors (i.e., SES and highest level of educational attainment) may explain these differences. Regarding valence, we found that participants with higher levels of SES and education tended to rate positive and neutral pictures less positively than the US normative standards, whereas those with lower levels of SES and education provided ratings similar to those of the standardization sample. Regarding both high- and low-arousal pictures, ratings of participants with higher levels of SES and education were significantly different from US normative standards (i.e., they tended more towards mid-range ratings for both types of pictures). In contrast, participants with lower levels of SES and education tended to rate high-arousal pictures as more arousing than the standardized scores.

Our findings are consistent with those from several cross-cultural IAPS studies showing that LMIC participants provide arousal ratings substantially different from those of the standardization sample (see, e.g., Dufey et al., 2011; Lohani et al., 2013; Soares et al., 2015). Furthermore, the literature suggests that individuals with differing levels of SES process emotional stimuli differently (Côté et al., 2010; Farah, 2017; Noble et al., 2012). In our sample, the mid-range arousal ratings of participants with lower levels of SES could be understood as either a misinterpretation of the arousal construct or a numbing of emotional responsiveness to arousing stimuli. Regarding the latter, participants living in low-income areas in South Africa are exposed to high levels of crime and violence and as a result may have been desensitized to the types of stimuli presented in the IAPS. Regarding their tendency to rate low-arousal pictures as more arousing, some of those pictures (e.g., those depicting a violin or a chess board) may be more novel to them than to high-SES participants (including those in the standardization sample). Novel pictures are typically rated as more arousing than familiar pictures (Blackford et al., 2010).

The finding that participants with higher levels of SES and education tended to rate positively valenced and neutral pictures as less positive than the IAPS standardization ratings is not as easy to explain. Most cross-cultural IAPS studies conducted in high-income countries find no such effect (see, e.g., Deák et al., 2010; Verschuere et al., 2001; Vila et al., 2001). We speculate that some of the positive and neutral images may be outdated and therefore carry less value for individuals, such as those in our higher-SES/education group, who are used to a high volume of positive images presented through social media.

In summary, participants’ ratings of the SA-APS images were more similar to US normative standards than were their ratings of the IAPS images. These results suggest that our modifications to the original picture set were, to at least some degree, successful. However, our other analyses showed that sociodemographic characteristics (i.e., SES and education) significantly influence how people rate the images. Hence, if such stimuli are to be used in populations where there is substantial variation in these characteristics (e.g., in an LMIC such as South Africa), normative data must be stratified appropriately. We developed a set of regression-based norms that might meet this challenge.

Initially, we included sex, SES, and level of education as predictors in our regression models. SES and education were the variables that defined our two separate groups, and we included sex alongside them because IAPS standardization and cross-cultural studies suggest that men and women provide different valence and arousal ratings (Bradley & Lang, 2007; Deák et al., 2010; Soares et al., 2015).

The regression results indicated that only SES and sex were significant predictors of participants’ ratings. Education was not a significant predictor, perhaps because the range of educational achievement was fairly restricted in our sample—most people had completed at least 12 years of schooling. Overall, SES accounted for the most variance. We speculate that SES is an important predictor of emotional response because it determines the kinds of experiences people have and the value they ascribe to them. Our speculation is consistent with similar conjecture, presented elsewhere, suggesting that there is cultural variation in the interpretation of affective stimuli (Lohani et al., 2013; Taras et al., 2016).

Sex accounted for significant variation in emotional responses: Women tended to give higher ratings in response to high-arousal (IAPS and SA-APS) and neutral-arousal (IAPS) pictures and more negative ratings in response to negatively valenced pictures (IAPS). These findings are broadly aligned with extant literature showing that women tend to give more emotive ratings (that is, rate negative pictures as more negative and high-arousal pictures as more arousing; Deák et al., 2010; Soares et al., 2015; Vila et al., 2001). Of note here is that whereas sex predicted participant ratings for three IAPS picture categories, it only predicted ratings for one SA-APS category, suggesting that the latter picture set is less likely to produce gender-variable responses.

Limitations

First, we note that our norming protocol deviates somewhat from the original IAPS norming protocol (Bradley & Lang, 2007). The deviations are primarily along the dimensions of (a) picture presentation, (b) participant response, (c) the number of pictures presented per participant, and (d) sample size (see Supplementary Material Table S1 for a complete list of all deviations).

In the IAPS validation study, Bradley and Lang (2007) presented the stimuli to groups of participants on a 4 × 5 ft screen, with each image projected for 5 seconds. In contrast, we presented the stimuli to each participant individually using a standard laptop or desktop screen, with each image projected for 6 seconds. In the original study, participants had 15 seconds to rate pictures for valence, arousal, and dominance, while our sample had an unlimited rating period for each of valence and arousal, but participants did not go on to view the next picture until they provided ratings. Our approach ensured that ratings were self-paced and that we did not have missing data.

In the IAPS validation study, different cohorts of participants rated 60 pictures in a single session. In contrast, each participant in the current study rated all 601 pictures (i.e., the complete IAPS set and the complete SA-APS set). Although the original study design avoids fatigue effects by requiring the rating of relatively few pictures, it also introduce the potential for between-cohort confounds. Our design avoided this latter difficulty, but added a potential fatigue confound. To minimize the effects of this potential confound, we did not require participants to complete their ratings of all 601 pictures in one uninterrupted session. Instead, they completed half of the pictures, took a 5-minute break, and then completed the other half.

Additionally, our sample size (N = 150) was relatively small in comparison to the IAPS norming protocol. Bradley and Lang (2007) had groups of up to 100 participants rate their groups of pictures (60 images per group; the authors do not confirm a total sample size, but it is estimated at 1970). The size of our sample also meant that the range of educational achievement was fairly restricted; recruiting a larger sample might have allowed more variation in educational achievement. Nonetheless, our sample size was adequate for our purposes, especially because we took a regression-based approach to creating normative data.

Overall, our norming protocol more closely resembles the way in which the IAPS is used in current research studies, with large numbers of images presented on a computer screen to a single participant (see, e.g., Bolinger et al., 2019; Cellini et al., 2016). This aspect of ecological validity adds to the strength of our research.

Second, our recruitment strategy focused on populating two groups: higher SES/education and lower SES/education. Hence, we are unable to comment on how high-SES individuals with low educational achievement, or low-SES individuals with high educational attainment, would respond to emotional pictures.

Third, our participants completed all study procedures in English. Although they were proficient in English, the arousal dimension of emotion is known to be more difficult to conceptualize for those who are not first-language English speakers. Future studies may benefit from translating instructions to suit each participant’s home language, thereby facilitating adequate comprehension of study constructs.

Summary and conclusion

We developed and tested a South African version of the IAPS. Our findings showed that South African participants’ ratings of SA-APS images were more similar to the US normative standards than their ratings of IAPS images. Nonetheless, there were still significant discrepancies in picture ratings from the US normative standards. We, therefore, developed regression-based norms to account for these differences and found that sex and SES predicted participant responses.

Unlike other studies in the cross-cultural IAPS literature, we sampled not only from university students but also from community populations. Our findings showed that important demographic factors such as SES influence the way individuals respond to emotionally valenced and arousing information. These results suggest that future cross-cultural studies should measure and account for SES in their models and also that responses given by student participants cannot be generalized to non-student groups.

Another aspect of the novelty of our study is that we provide a template for other cross-cultural IAPS studies to follow: We identified pictures that were problematic and replaced them with more culturally relevant pictures; we tested the newly constituted picture set in a socioeconomically and educationally diverse sample; and we developed regression-based norms that allowed us to account for relevant demographic factors predicting responses to both the original IAPS and our redeveloped South African version. These norms are especially useful because, within the field of emotion research, few studies emerge from LMICs and there are few psychometric tools whose development and norms account for substantial diversity in SES, education, and other sociodemographic factors.

In conclusion, having a reliable and valid instrument for the measurement of emotional response is crucial when studying both clinical and non-clinical populations. Because issues concerning emotional health range broadly from everyday stress to psychopathology to emotional impairment caused by brain damage, having a flexible and culturally fair tool allows widespread opportunity for scientific investigation of these issues. Extending the reach of tools such as the IAPS to LMICs can help ensure a more globally representative body of research in this field.