Emotional contents are remembered better than nonemotional contents. This outcome has been reported both quantitatively, indicating that emotional contents are recalled or recognized better than neutral contents (see Buchanan & Adolphs, 2002), as well as qualitatively, indicating that memory is richer in detail for emotional than for nonemotional contents (Doerksen & Shimamura, 2001; Kensinger & Corkin, 2003). This memory enhancement for emotional contents is also supported by neuroimaging and neuropsychological data that have indicated that emotional and neutral contents engage different cognitive and neural mechanisms (see Hamann, 2001).

In contrast to the extensive evidence for the effect of emotion on studied events, it is not yet clear how emotion modulates false memory of nonstudied events. Inasmuch as falsely remembering events that have never occurred is a pervasive phenomenon (see Gallo, 2006), a study of whether and how emotional experience in particular may be falsely remembered constitutes an important question. Past research has shown that emotional events can be falsely remembered. For example, Laney and Loftus (2008) assessed the emotional content of true and false memories for childhood events and showed that false memories of emotional events could be implanted via a suggestive manipulation. They also compared true and false memories on several emotional dimensions (e.g., emotion specificity and intensity) and found that those implanted false memories were as subjectively emotional as true memories in most dimensions, indicating that the substantial emotionality of an event does not necessarily ensure memory accuracy. Bajo, Fleminger, and Kopelman (2010) also recently showed that the autobiographical memories of memory-disordered patients have an affective bias in which confabulated memories often have more of an emotional tone to the patients than do true memories.

Yet there is no consensus regarding whether emotional content affects the likelihood of memory distortion. Past research that has directly examined the effect of emotion on false memories has reported mixed findings, in which emotional valence has either increased (e.g., Gallo, Foster, & Johnson, 2009) or decreased (e.g., Kensinger & Corkin, 2004; Pesta, Murphy, & Sanders, 2001) false memories. These divergent findings in the literature, for the most part, seem to depend on the types of stimuli employed and the aspects of the stimuli that become salient during encoding. Such methodology-dependent conflicts in the past findings suggest that the interplay among the various factors that affect true memory for emotional content (e.g., valence, arousal, distinctiveness, conceptual relatedness, attention, and retention interval) may also influence the distortion of memory for emotional content. However, these factors may influence true and false memory in different directions; a factor that contributes to enhanced true memory can either increase or decrease false memory for emotional content. As a result, the underlying factors influencing true and false memory for emotional material are not yet fully understood.

The goal of the present study was to examine the effect of emotion on memory accuracy and distortion, with a specific focus on two inherent features of emotional content: distinctiveness and conceptual relatedness. In the following sections, we will discuss the roles of these two mechanisms in modulating memory for emotional content.

The role of distinctiveness

An item is well remembered when it has certain features that make it distinctive from other items. This phenomenon has been referred to as the distinctiveness effect in the memory literature (see Schmidt, 1996, for a review). This effect has been observed in numerous studies using words, with manipulations of color (e.g., Bireta, Surprenant, & Neath, 2008), size (e.g., Kelley & Nairne, 2001), or semantic category (Geraci, McDaniel, Manzano, & Roediger, 2009; Geraci & Rajaram, 2004), as well as studies using spatial information (e.g., Guérard, Neath, Surprenant, & Tremblay, 2010).

The emotional enhancement of true memory is often attributed to the distinctiveness of emotional content. Although emotional stimuli may benefit from secondary distinctiveness, with emotional stimuli being more likely than neutral stimuli to elicit autonomic reactions and cognitive thought, primary distinctiveness may also play a large role. For instance, studies have shown that the emotional enhancement of memory is sometimes confined to a mixed-list design in which emotional and neutral stimuli are intermixed within a list. The emotional enhancement effect is substantially diminished when the primary distinctiveness of emotional stimuli is removed in a pure-list design (Dewhurst & Parry, 2000; Schmidt & Saari, 2007; see also Talmi, Luk, McGarry, & Moscovitch, 2007, for discussion).

Regarding the role of distinctiveness in memory distortion, distinctive information about particular items reduces false recognition of nonstudied items. In Israel and Schacter (1997), participants studied materials with either words or pictures (black-and-white line drawings) and later engaged in a recognition task in which only words were presented as retrieval cues. The results revealed that the pictorial encoding significantly reduced false recognition of lures as compared to the word encoding, which suggested that even when only words were provided as retrieval cues, participants were sufficiently able to reject a lure word because they did not remember seeing a corresponding picture of the word. In other words, the distinctive qualities of pictures enabled the use of a metacognitive process, namely a distinctiveness heuristic, to reduce false recognition (see also Gallo, Cotel, Moore, & Schacter, 2007; Schacter, Israel, & Racine, 1999). Consequently, Schacter, Gallo, and Kensinger (2007) argued that emotional content also can be used as a cue to engage a distinctiveness heuristic, and thereby suppress false memory of nonstudied events, as emotional contents can increase recollective distinctiveness (e.g., Ochsner, 2000; Schmidt, 2007).

Pesta, Murphy, and Sanders (2001) investigated false memory for emotional words by having participants study associates that were orthographically related to either emotional (e.g., hell) or nonemotional (e.g., peach) critical lures. Young adults falsely recognized nonemotional lures more often than emotional lures, suggesting that participants used the emotional distinctiveness of the emotional lures to keep them from falsely recognizing these lures. In an investigation of the effects of age on this phenomenon, Kensinger and Corkin (2004) used the same list of words and found that older adults, who are more susceptible to false memories than younger adults (Koutstaal & Schacter, 1997), were also able to reduce their false memories by getting benefits from the emotional lures’ emotional distinctiveness.

Notably, the aforementioned studies showing reduced false recognition of emotional items used an adapted version of a Deese/Roediger–McDermott (DRM; Deese, 1959; Roediger & McDermott, 1995) type of stimuli. In the DRM paradigm, studying a semantically related list of words, such as table, sit, legs, seat, and cushion, is likely to elicit strong false recognition and false recall of a critical lure—chair, in this case (Stadler, Roediger, & McDermott, 1999). In the studies by Pesta et al. (2001), and Kensinger and Corkin (2004), they used orthographically, not semantically or conceptually, related lists of words (e.g., bell, dell, jell; leach, teach, reach), such that the corresponding critical lures were either emotional (e.g., hell), or nonemotional (e.g., peach). As an item’s distinctiveness can arise from a variety of sources—perceptual, orthographic, conceptual, emotional, or visual, to name a few—depending on the context in which the item occurs (Schmidt, 1996), the critical lures in these stimuli could have arisen either because the critical lures were conceptually distinctive or because they were emotionally distinctive, as compared to the context of orthographically related words. In other words, since the manipulation of conceptual distinctiveness (i.e., conceptual incongruence) and relatedness was absent in the stimuli, as Kensinger and Corkin (2004) argued in their discussion, it is not clear whether the reduction of false recognition of emotional items was due to the emotional nature of the items per se, or rather to the conceptual distinctiveness of the items. Also, it is worth noting that the critical lures used in those studies were extremely negative or taboo words (e.g., slut, whore, or rape) that were already very distinctive by their very nature, in terms of their low frequency and unexpectedness in the context of a psychology experiment, not necessarily only in terms of emotionality. In other words, emotional distinctiveness alone might not have enabled the use of the distinctiveness heuristic, and in turn decreased false memory.

The role of conceptual relatedness

In considering the effects of emotional content on true memory, the role of conceptual relatedness has emerged in the context of the effects of delay on emotional memory. It is well documented that the long-term effects of emotional arousal benefit from modulation of consolidation processes. This modulation model (Cahill & McGaugh, 1998; McGaugh, 2004) was developed from rodent models, highlighting the importance of interactions between the basolateral nucleus of the amygdala and the hippocampus; evidence consistent with this model has been acquired in numerous studies in humans, as well (e.g., Anderson, Wais, & Gabrieli, 2006; Cahill & Alkire, 2003). Yet emotional enhancement of memory occurs not only after delays of sufficient length to allow for modulatory effects of consolidation, but also after relatively brief delays. The mechanisms supporting this fast-acting enhancement have been debated (see Talmi, Anderson, Riggs, Caplan, & Moscovitch, 2008, for a discussion), but one proposal is that emotional items are well remembered because they are conceptually related to one another, and therefore can be more easily organized in memory (Talmi & Moscovitch, 2004).

However, this conceptual-relatedness account has been proposed to lead not only to increases in hit rates (Talmi & Moscovitch, 2004), but also to increases in false alarm rates for emotional items, because the emotional stimuli are more confusable due to their overlapping features (Gallo et al., 2009; see Brainerd, Stein, Silveira, Rohenkohl, & Reyna, 2008, and Bauer, Olheiser, Altarriba, & Landi, 2009, for discussion of whether such conceptual relatedness is greater for stimuli with negative or positive valence).

Gallo et al. (2009) noted that the greater conceptual relatedness reported for negative stimuli (Brainerd et al., 2008) might be attributable to the use of words, which are less distinctive and less emotionally arousing than pictures (also Bauer et al. 2009; Kapucu, Rotello, Ready, & Seidl, 2008). Gallo et al. (2009) adapted pictures from the International Affective Picture System (IAPS) set (Lang, Bradley, & Cuthbert, 2005), which contains emotionally arousing images of people, objects, animals, and found that false recognition was greater for emotional pictures, both positive and negative, than for neutral pictures. However, as the authors noted, even in the case of these stimuli, it was not clear whether the conceptual overlap in the emotional items was due to the categorizable nature of the stimuli (people, activities, etc.) that might induce certain emotions, or to the use of valence as a category (happy items, sad items, etc.). Also, to presage the question examined in the present study, the ratings of content overlap that Gallo et al. obtained from an independent group of participants indicated that negative and positive items were significantly more conceptually related than were neutral items. Such ratings data indicate that the emotional stimuli had higher conceptual overlap to begin with than did the neutral stimuli, which raises the question of whether the increased conceptual relatedness of valenced stimuli is an inherent property of those stimuli, or whether it reflects a confound in the particular stimulus sets that researchers often use.

The present study

In sum, past research has suggested that both the distinctiveness and conceptual-relatedness accounts can explain the enhanced true memory for emotional content, but the operations of the two accounts would result in opposing effects on false memory: Distinctiveness would decrease memory distortion by enabling the use of the distinctiveness heuristic (distinctiveness heuristic account), whereas conceptual relatedness would increase memory distortion (conceptual-relatedness account).

To examine these hypotheses, in the present study we aimed to assess the effects of emotion on true and false memory by equating the thematic relatedness of emotional and neutral stimuli. This was achieved by utilizing categorically bound stimuli for each type of emotional valence—negative, neutral, and positive. By equating the thematic relatedness of items within each valence group (negative, positive, and neutral), we controlled for the extents to which the selected emotional stimuli might be better categorized than neutral stimuli for reasons other than valence itself. That is, the enhanced conceptual relatedness of emotional stimuli is likely to be influenced by two factors: the thematic grouping of the stimuli themselves, and the relatedness created by the reactions of participants. The former relatedness would arise because negative and positive stimuli would be likely to be selected from a smaller subset of categories than would neutral stimuli (Talmi et al., 2007; Talmi & Moscovitch, 2004). The latter relatedness would arise because there might be a greater actual or perceived similarity in the reactions elicited across a range of negative stimuli or positive stimuli than in those evoked by neutral stimuli, which might increase the tendency for participants to cluster stimuli of like valence together (Sison & Mather, 2007).

By selecting stimuli that were all thematically related to one another, we could test whether emotional stimuli were grouped on the basis of valence (promoting conceptual relatedness driven by valence) or whether valence fails to override the other forms of grouping that were now equated across emotional and neutral stimuli. If emotional stimuli are grouped on the basis of valence, then the conceptual relatedness of emotional stimuli should remain higher than the conceptual relatedness of neutral stimuli, even when the thematic relations of the two were matched, likely leading to enhanced false memories for the emotional stimuli. If, by contrast, valence fails to override the other forms of grouping, then false-memory rates should be no higher for emotional than for neutral stimuli. Because little evidence exists for the effects of valence on memory when all of the stimuli (emotional and neutral) are controlled for thematic relatedness, our study was aimed at understanding emotional memory effects under these conditions for both true and false memories.

Experiment 1

This experiment employed two memory tasks to obtain converging evidence: We used a recognition memory task to evaluate the hits and false alarms to valenced items, and a cued-recall task to evaluate correctly recalled items and intrusions. Thematic relatedness has previously been shown to influence performance on both of these types of tasks, with participants often falsely recognizing or recalling lure items that are associated with a study list (Deese, 1959; Roediger & McDermott, 1995). We hypothesized that valenced information would lead to higher hits in both recognition and cued-recall tasks, on the basis of evidence in the literature that had shown enhanced true memory for emotionally valenced information (LaBar & Cabeza, 2006).

The two tasks also provided a test of the two theoretical accounts on memory distortion for emotional content—the distinctiveness heuristic and conceptual relatedness—through an examination of the false memory responses. We reasoned that the ability to use the distinctiveness heuristic might be weakened when strong categorical relations link the studied and nonstudied items, as the strength of the categorical relationship might interfere with the emotional distinctiveness of the items. If valenced stimuli lead to lower false alarms than do neutral stimuli, even under these conditions in which conceptual relatedness is highly matched across valence, it would indicate that participants are still able to get benefits from the distinctiveness of valence to suppress false alarms. On the other hand, if valenced stimuli lead to higher false alarms than do neutral stimuli, it would mean that the valenced-based grouping process can override the category-based grouping process, and thereby increases the memory distortion for valenced stimuli.

Method

Participants and design

A group of 48 Stony Brook University students participated for course credit, with 24 participants each in the recognition and cued-recall task conditions.

Materials

Normative stimuli that had been selected and modified by Kensinger and colleagues were used (see Kensinger, 2007), as these stimuli enabled the manipulation of both valence (negative, neutral, and positive) and of the categorical structuring of items within each valence. For example, the negative stimuli included categories such as war (with nuclear bomb, battle ship, etc., as category members), funeral, and dental instruments; the positive stimuli included categories such as pets (with kitten, puppy, etc., as category members), toys, and flowers; the neutral stimuli included categories such as office items (with book shelf, rolling chair, etc., as category members), materials, and geography. The target stimuli were 360 items for which both the word and pictorial counterparts were included in the total stimulus set. We included 120 items in each valence of negative, neutral, and positive, for a total of 45 categories (15 categories per valence), resulting in eight items per category. On the basis of prior norming studies, all negative photo objects were rated lower than 4 on a valence scale of 1–9 (with 1 being the most negative) and higher than 5 on an arousal scale of 1–9 (with 9 being the highest arousal). All positive photo objects were rated as higher than 5 on valence and higher than 5 on arousal. The negative and positive photo object sets did not differ in arousal (p > .25) and did significantly differ in valence (p < .001). All of the neutral photo objects were rated between 3 and 6 on valence and lower than 5 on arousal. All of the items, regardless of valence, did not differ in frequency, familiarity, or imageability (norms from the MRC database, all ps > .15), as well as in visual complexity, Fs < 1.5, ps > .25, as determined by normative data from 20 young adults. The items also did not differ in the numbers that included people, inanimate objects, animals, or landscapes across valence and categories. This matching was done by selecting the categories and photo exemplars in triplicate (e.g., there were equivalent numbers of images with people in the “in a hospital” [negative], “in a restaurant” [positive], and “in a school” [neutral] categories).

The valence of the verbal labels accompanying the photo objects significantly differed: negative (M = 3.29, range = 1.33–6.00), neutral (M = 5.11, range = 3.33–7.00), and positive (M = 6.33, range = 4.33–8.00) (ps < .001). The arousals of the verbal labels for the negative (M = 5.11, range = 1.33–8.00) and positive (M = 5.11, range = 3.00–7.33) photo objects did not differ (p > .96) but, as would be expected, they were rated higher for arousal than were verbal labels for neutral photo objects (M = 3.34, range = 1.33–5.00) (ps < .001). The ratings for the category labels were as follows: for valence, negative (M = 2.47, range = 1.67–3.33), neutral (M = 4.80, range = 4.00–5.67), positive (M = 5.80, range = 4.67–6.67); for arousal, negative (M = 5.11, range = 3.00–6.33), neutral (M = 3.00, range = 2.33–3.67), positive (M = 4.96, range = 3.33–6.00).

Procedure

Encoding

At the beginning of the study session, the participants were instructed, in both written and spoken forms, that they would see a series of words, the corresponding pictures, and the category names. They were asked to rate each word for its goodness of fit to the category on a scale of 1–5 (1 = average, 3 = good, 5 = outstanding). We anchored the scale so that the lowest rating would not mean a bad fit, as we intentionally made all of the items related to the category. The participants were also informed that they would be given a memory test on the study list later in the experiment. The nature of the memory test was unspecified.

Each study-phase trial included a fixation (1 s) and a study display (5 s). On a white screen, the category name was presented on the top, a picture (300 × 300 pixels) and its corresponding word label (Verdana 20-point font) were presented vertically in the center, and the goodness-of-fit scale was presented at the bottom. After performing ten practice trials, the participants performed the encoding task on 225 trials, presented in a randomly intermixed order with respect to category. Five of the items from each category were presented as study items, while three items were reserved to serve as nonstudied items for the assessment of false alarms (in the recognition memory test) or intrusions (in the cued-recall test), resulting in 135 stimuli to be later used as nonstudied items. The studied and nonstudied stimuli were counterbalanced across participants, resulting in a total of eight study lists. This study phase was identical for all participants, regardless of whether they later performed a recognition memory test or a cued-recall test.

Filled delay

After the participants completed the encoding phase, they performed a numeric distractor task for 30 min, in order to prevent ceiling effects on the subsequent memory test. The 30-min delay was selected on the basis of pilot data that calibrated the performance levels.

Retrieval

Self-paced recognition or cued-recall retrieval tasks were used for two different groups of participants. For the recognition task, the participants were presented with a list of 360 words (without the corresponding pictures and category names), including both studied and nonstudied words, which were presented in a different intermixed order from the one on the study list. In each trial, they were instructed to select “old” if they had seen the word in the earlier study phase, and “new” if they had not seen the word. They also rated their confidence right after each old/new decision, on a scale of 1–5 (1 = least confident, 3 = somewhat confident, 5 = very confident).

For the cued-recall task, each participant was given a spreadsheet in which the category names were prelabeled in each single tab. They were instructed to type as many words as they could recall in each tab and then to shift to the next category, and also were clearly instructed not to move back to the previous category once they had moved to the next category.

Valence ratings

After completion of the retrieval tests, all participants were once again presented with all of the 360 stimuli (studied and nonstudied) in word form in randomized order, and they rated all of the stimuli for valence on a scale of 1–3 (1 = negative, 2 = neutral, 3 = positive). These ratings were used to see whether participants’ valence ratings corresponded with the preassigned valence ratings to each item.

Results

We conducted repeated measures ANOVAs with Valence (negative, neutral, or positive) as a within-subjects factor, separately for the recognition and cued-recall tasks. The overall pattern of results for the cued-recall tasks was identical to that for the recognition task. Because the intrusion rates in cued recall revealed a floor effect (M = 0.01, SE = 0.004), we will report the results in detail only for the recognition task.Footnote 1

Goodness-of-fit ratings

The overall mean goodness-of-fit ratings at encoding for all of the items were quite high (M = 4.11, from good to outstanding fits). The means for each valence (4.15 for negative, 4.06 for neutral, and 4.12 for positive) did not differ, F(2, 46) = 1.31, p = .28, indicating that participants’ perceived category memberships of all items were equally strong across the valence categories.

Task performance

Table 1 displays the mean proportions of hits and false alarms. A repeated measures ANOVA revealed a significant main effect of valence (negative, neutral, or positive) on hits, F(2, 46) = 5.23, MSE = .002, p < .01. Subsequent t tests revealed that the hit rates for negative items were significantly higher than those for neutral items, t(23) = 3.30, SEM = .013, p < .01, but did not differ from the hit rates for positive items, t(23) = 1.61, SEM = .015, p = .12. The hit rates for positive and neutral items did not differ, either, t(23) = 1.56, SEM = .011, p = .13. These patterns are consistent with the previous findings that young adults tend to recognize valenced information better than neutral information, and that this advantage can be stronger for negatively valenced items (e.g., Ochsner, 2000). However, interestingly, we found no main effect of valence on false alarms, F(2, 46) = 1.30, MSE = .001, p = .28, indicating the absence of a conceptual overlap effect related solely to valence. We will return to a discussion of these patterns.

Table 1 Means and standard errors (in parentheses) of test performance, memory sensitivity (d'), and response bias (C)

The patterns observed for corrected recognition (hit minus false alarm rates; see Fig. 1) and for memory sensitivity measures, d' (Table 1), across valences were consistent. For economy, we will elaborate on the corrected-recognition measures in all experiments (the full d' analyses from all three experiments are available in supplemental materials). A repeated measures ANOVA revealed a significant effect of valence, F(2, 46) = 5.17, MSE = .003, p = .01, consistent with the analysis of hit rates. The corrected recognition rates for negative (M = .64) and positive (M = .63) items were higher than those for neutral items (M = .59), t(23) = 2.96, SEM = .016, p < .01, and t(23) = 2.49, SEM = .014, p < .05, respectively, and the rates for negative and positive items did not differ, t < 1. The response bias (C) analyses showed somewhat conservative criteria in all three valence conditions (all means above zero; see Table 1), with a statistically lower conservative criterion in the negative valence condition than in the neutral, t(23) = 2.14, SEM = .044, p < .05, and positive, t(23) = 2.02, SEM = .057, p = .06, items. The neutral and positive items did not differ, t < 1.

Fig. 1
figure 1

Corrected recognition (hits minus false alarms) in Experiment 1 as a function of valence, with a 30-min delay following goodness-of-fit ratings at encoding

Confidence ratings

The overall confidence ratings were quite high (M = 4.42, from confident to very confident; see Table 1) and, as expected, a repeated measures ANOVA with Valence and Response Type (hits/false alarms) as factors revealed that the participants were more confident in their hits than their false alarms, F(1, 23) = 37.38, MSE = .540, p < .001. However, the confidence ratings did not differ across valences, F < 1.

Valence ratings

The overall mean valence ratings for words differed across the three valence categories: 1.76 (negative), 2.23 (neutral), and 2.40 (positive), F(2, 46) = 93.40, MSE = .028, p < .001. A strong correlation emerged between the valence that we assigned and the valences that participants rated for the items used in the study, r = .74, p < .01, suggesting that individuals’ subjective judgments on valence (e.g., the word spider was judged as being negative by one, but may have been judged as being neutral or positive by another) were not a confounding factor in evaluating our data.

Discussion

This experiment showed that recognition memory was better for emotional than for neutral items, even when category membership assessments were equivalent across valences. These patterns are consistent with the classic effects that memory is better for emotional than for neutral items (LaBar & Cabeza, 2006) and that the advantage can be stronger for negatively valenced items (Ochsner, 2000).

Because thematic relatedness was controlled in the present experiment for both emotional and neutral stimuli, the present findings provide novel evidence that emotional enhancement of memory can occur even under these conditions. Past evidence using the free-recall task has shown that this memory advantage for emotional information disappears when emotional words (Talmi & Moscovitch, 2004) or pictures (Talmi et al., 2007) are compared to categorically organized neutral stimuli. In our recognition memory task, when the organizational structures of both emotional and neutral stimuli were equated, we observed better true memory for emotional than for neutral stimuli. This outcome shows an effect of valence on recognition memory over and above the effect of organizational structures, even when the categorical structures for emotional versus neutral items were held in the most stringent equivalence.

The advantage for emotional items in true memory for thematically controlled stimuli could be supported by both the enhanced distinctiveness and the enhanced conceptual relatedness of emotional items, raising questions about the role of valence in reducing false alarms (the distinctiveness heuristic) or increasing them (conceptual relatedness of valences). As we discussed earlier, some past studies have shown that emotional content could suppress false memory (Kensinger & Corkin, 2004; Pesta et al., 2001), yet their results opened the question of whether this suppression was due to emotional or conceptual distinctiveness. Other studies have shown that emotional content increases false memory (Brainerd et al., 2008; Gallo et al., 2009), yet they also opened the question of whether the conceptual relatedness was due to the categorizable nature of the valenced stimuli or to the use of valence as a category. After holding both thematic distinctiveness and relatedness relatively constant across valences by adding category structure into the stimuli, we found that false recognitions were equivalent for both emotional and neutral items, even when accurate recognition replicated the standard emotional memory advantage. These results suggest that when conceptual distinctiveness is controlled, these emotional items do not have enough distinctiveness to function as cues to decrease false recognition.

Furthermore, the claim based on the conceptual-relatedness hypothesis that emotional content has stronger conceptual similarities than neutral content, and thus that they increase memory confusion, is also not supported. It is possible that the effects of conceptual relatedness and the ability to use emotional distinctiveness act in opposition by canceling out each other’s operations on false-memory rates, so that, on balance, no effect of valence emerges (see Kelley & Wixted, 2001, who also proposed that strengthening associations can enhance hit rates but have no effect on false alarm rates). That is, our categorically bound stimuli offset the grouping by valence, and thus prevented valence from overriding category information in influencing false memory.

Experiment 2

To reexamine the lack of a valence effect on false memory and to expand on the findings from Experiment 1, we conducted Experiment 2 with a longer, 24-h retention interval. Recent research has suggested that at shorter delays, the effects of emotion on attention, organization, and distinctiveness may be sufficient to account for the effects of emotion on memory (Talmi & McGarry, 2012). By contrast, after longer delays, the memory enhancement for emotional content has been attributed to a modulation of consolidation processes (see McGaugh, 2000, for a review). In fact, studies have shown that memory enhancement for emotional content increases as the retention interval increases and that this increased effect of emotion on memory requires the amygdala (LaBar & Phelps, 1998; Canli, Zhao, Brewer, Gabrieli, & Cahill, 2000). In Experiment 2 we investigated, as the retention interval increased, (a) whether the true-memory enhancement for emotional content would still remain intact, and (b) whether the effect of valence-based conceptual relatedness would emerge to increase false memory or, rather, whether emotional distinctiveness would reduce false memory for emotional as compared to neutral items.

Method

Participants and design

A group of 48 Stony Brook University students participated for course credit, with 24 participants each in the recognition and cued-recall task conditions.

Procedure

The procedure was identical to that used in Experiment 1, except for the retention interval. In this experiment, after the participants completed the encoding phase, they were asked to come back 24 h later to complete either the recognition or the cued-recall task.

Results

Repeated measures ANOVAs were conducted with Valence (negative, neutral, or positive) as a within-subjects factor, separately for the recognition and cued-recall tasks. The intrusion rates in cued recall were at floor, even after 24-h retention (M = .05, SE = .008), and so, once again, we will only report the results for the recognition task.Footnote 2

Goodness-of-fit ratings

The overall mean goodness-of-fit ratings for all of the items were quite high (M = 3.73, from good to very good fits). Participants in this experiment perceived category membership differently across valences, F(2, 46) = 8.48, MSE = .057, p = .001: They rated the category memberships of positive items (M = 3.90) as being stronger than those of negative (M = 3.64), t(23) = 3.78, SEM = .067, p = .001, or neutral (M = 3.66) items, t(23) = 3.92, SEM = .060, p = .001. No such difference emerged between negative and neutral items, t < 1.

Task performance

Table 1 displays the mean proportions of hits and false alarms. A repeated measures ANOVA revealed a significant main effect of valence (negative, neutral, or positive) on hits, F(2, 46) = 6.76, MSE = .003, p < .01, which was driven by higher hit rates for negative items (M = .73) than for neutral (M = .67) and positive (M = .70) items, t(23) = 3.58, SEM = .016, p < .01, and t(23) = 2.10, SEM = .015, p < .05, respectively. We found no such difference between positive and neutral items, t(23) = 1.64, p = .12. These outcomes suggest that memory for negative items was preserved better than that for neutral or positive items across the delay.

Interestingly, now there was a main effect of valence on false alarms that had been absent at the short, 30-min delay (Exp. 1), F(2, 46) = 7.11, MSE = .003, p < .01; the pattern showed that the neutral items were falsely recognized more frequently than were the negative and positive items, t(23) = 2.70, SEM = .019, p < .05, and t(23) = 3.55, SEM = .015, p < .01, respectively. No significant difference between the negative and positive items emerged for false alarm rates, t < 1.

Analyses on the corrected recognition (hits minus false alarms) measure showed patterns consistent with the same analyses in Experiment 1 (Fig. 2). A repeated measures ANOVA revealed a significant effect of valence, F(2, 62) = 4.93, MSE = .005, p = .01. The corrected recognition rates for negative (M = .67) and positive (M = .68) items were significantly higher than the rate for neutral items (M = .63), t(31) = 2.39, SEM = .02, p < .05, and t(31) = 3.36, SEM = .02, p < .01, respectively; and the same rates for negative and positive items did not differ, t < 1. The d' scores (Table 1) showed the same statistical patterns. The response bias (C) scores were once again somewhat conservative in all three valence conditions (above zero, Table 1) and did not differ across valences, F(2, 46) = 1.39, MSE = .016, p = .26.

Fig. 2
figure 2

Corrected recognition (hits minus false alarms) in Experiment 2 as a function of valence, with a 24-h delay following goodness-of-fit ratings at encoding

Confidence ratings

As expected, participants were more confident in hits than in false alarms, F(1, 23) = 12.36, MSE = .35, p < .001, and reported equal confidence across valences, F(2, 46) = 1.19, MSE = .108, p = .31.

Valence ratings

The overall mean valence ratings for words differed across the three valence categories: 1.79 (negative), 2.22 (neutral), and 2.38 (positive), F(2, 46) = 64.51, MSE = .058, p < .001. As was the case in Experiment 1, a strong correlation emerged between the valence that we assigned and the valences that participants rated for the items used in the study, r = .69, p < .01.

Discussion

Overall, the memory enhancement for emotional items was preserved well across delays, and this was especially true for the negatively valenced items. It is noteworthy that hit rates were higher for negatively valenced items, even though participants in this recognition task perceived the strongest category membership for positively valenced items. These patterns are consistent with the findings from Experiment 1, and thus again suggest that the strong thematic relatedness that was perceived for positive items neither affected nor overrode the enhanced memory for the negatively valenced items. This result is also consistent with previous studies that have investigated the role of study–test delay on emotional memory and shown elevated effects of arousal on memory across longer delays by using only negatively valenced stimuli (LaBar & Phelps, 1998). Some of those studies emphasized that postencoding processes, such as continued elaboration and rehearsal, can enhance the memory for emotional materials, especially among young adults (Libkuman, Stabler, & Otani, 2004), and that those postencoding actions are affected by encoding processes, such that items that are more deeply encoded are more likely to be consolidated over the long term. In the present study, encoding was done both thematically (by perceiving and making decisions on the category membership) and emotionally (by perceiving the emotionality of valenced stimuli). Given the highest category membership ratings for positively valenced items from this set of participants, one could argue that the participants would have encoded positive items more deeply in a theme-specific way than they did the negative or neutral items, and thus that the positive items would be consolidated better than the negative or neutral items. However, our results revealed the opposite pattern, with better memory for negative than for positive items. This outcome indicates that, over the relatively long retention interval, thematically driven encoding for positive items was overridden by the emotionally driven encoding for negative items.

Our findings on the increased false recognition of neutral items as compared to emotional items across a delay stand in contrast to the previous findings by Gallo et al. (2009) of increased false recognition of emotional as compared to neutral items across the same 24-h delay. The discrepancy mainly resulted from a critical difference in the nature of the stimuli used—that is, the presence of categorical features in both the emotional and neutral stimuli used in the present study. As we noted earlier, the stimuli used by Gallo et al. (2009) did not specifically include categorical information. This could have resulted in their emotional stimuli being more related than their neutral stimuli (as was suggested by their participants’ ratings) or could have led their participants to group valenced items together by using valence as a categorical cue (e.g., pleasant items, unpleasant items, etc.), as the authors suggested. That would have caused more confusion between the studied and nonstudied emotional items, and in turn would have led to enhanced false recognitions for the emotional items. However, the categorized stimuli used in the present study allowed for grouping by category as well as by valence, thereby allowing a test of the influence of valence on suppressing or enhancing false memories.

In brief, the delay manipulation revealed an advantage for the valenced items (especially negatively valenced items) in both promoting hits and reducing false alarms for these items.

Experiment 3

The results of Experiments 1 and 2 converged to show that the categorically bound content did not increase false recognition for emotional content. However, one could argue that the absence of an emotional boost on false memory might have resulted from the conceptual categorization task at encoding (i.e., goodness-of-fit ratings) and eclipsed any effect of emotional categorization on false recognition. We conducted Experiment 3 to overcome this possibility by employing an encoding task that focused participants on the emotional categorization of the stimuli rather than on their thematic relations, followed by testing memory on a recognition task.

Method

Participants and design

A group of 36 Stony Brook University students participated for course credit.

Procedure

The procedure was the same one that had been used in Experiment 1 for the recognition memory task, except that the encoding task was changed. In Experiment 3, category names were not given at encoding, and the participants were asked to rate each word for arousal on a scale of 1–5 (1 = low, 3 = medium, 5 = high; see Gallo et al., 2009) instead of rating each word for its goodness of fit to a category.

Results

Arousal ratings

The participants rated positive (M = 2.69, SE = .12) and negative (M = 2.56, SE = .15) items as being equally arousing, t < 1. In comparison to the neutral items (M = 2.36, SE = .11), positive items were rated as significantly more arousing, t(31) = 6.33, SEM = .05, p < .001. The numerically higher arousal ratings for the negative items did not reach statistical significance as compared to the neutral items,Footnote 3 p = .17.

Task performance

Table 1 displays the mean proportions of hits and false alarms, and Fig. 3 displays the mean proportions of corrected recognition (hits – false alarms) as a function of valence. A repeated measures ANOVA conducted on hit rates indicated a main effect of valence, F(2, 62) = 6.59, MSE = .002, p < .01. Subsequent t tests revealed significantly higher hit rates for negative than for neutral items, t(31) = 3.69, SEM = .011, p < .01, and higher hit rates for positive than for neutral items, t(31) = 3.23, SEM = .011, p < .01. No difference between negative and positive items emerged, t < 1. The higher hit rates for negative and positive items than for neutral items are consistent with the results from our previous experiments, in which the encoding task involved a conceptual categorization (i.e., goodness-of-fit ratings to categories). Also consistent with the results from our Experiment 1, in which the same recognition task was administered after the same delay, the effect of valence on false alarms continued to be absent, F(2, 62) = 1.63, MSE = .002, p = .20. Therefore, enhanced true memory for valenced items and the absence of a valence effect on false memory persisted, regardless of the different encoding tasks employed.

Fig. 3
figure 3

Corrected recognition (hits minus false alarms) in Experiment 3 as a function of valence, with a 30-min delay following arousal ratings at encoding

Analyses on the corrected recognition measure revealed patterns consistent with those of Experiments 1 and 2, and the patterns of d' scores (Table 1) also converged once again. The effect of valence on corrected recognition was significant, F(2, 62) = 4.93, MSE = .004, p = .01; subsequent t tests revealed that corrected recognition was higher for both negative and positive items than for neutral items, t(31) = 2.39, SEM = .015, p < .05, and t(31) = 3.36, SEM = .015, p < .01, respectively. We found no difference between negative and positive items, t < 1. The response bias (C) analyses once again showed the use of somewhat conservative criteria (means above zero in all three valence conditions, Table 1), with a statistically lower conservative criterion in the negative than in the neutral valence condition, t(31) = 2.05, SEM = .042, p < .05.

Confidence ratings

Participants were more confident in hits than in false alarms, F(1, 28) = 57. 28, MSE = .988, p < .001. Their reported confidence was different across valences, F(2, 56) = 8.64, MSE = .165, p < .01: The participants were more confident in their judgments for negative than for neutral items, t(31) = 3.30, SEM = .030, p < .01, and also in those for positive than for neutral items, t(31) = 4.29, SEM = .022, p < .001. The confidence ratings on negative and positive items did not differ, t < 1.

Valence ratings

The overall mean valence ratings for words differed across the three valence categories: 1.68 (negative), 2.11 (neutral), and 2.30 (positive), F(2, 62) = 105.77, MSE = .03, p < .001. As was the case in Experiments 1 and 2, we found a strong correlation between the valence that we assigned and the valences that the participants rated for the items used in the study, r = .81, p < .01.

Discussion

The purpose of Experiment 3 was to eliminate possible effects of the categorization-salient encoding task used in Experiments 1 and 2, which could have overpowered our manipulation of using categorized stimuli. In Experiment 3, the emotional enhancement effect of true memory remained robust, and the effect of valence on false memory was once again absent, even though participants performed emotionality-based processing without reference to the category labels at encoding.

In Experiments 1 and 2, it was possible that the goodness-of-fit ratings during encoding overemphasized conceptual categorization and, in turn, weakened the processing of valence-based categories, which could have increased false recognition for emotional items. However, such an overactivation of thematic categorization should have been eliminated in Experiment 3, as no category labels were provided at encoding, which consequently made the thematic categorization less obvious. Thus, the persistent absence of an emotional boost on false recognition supports the possibility that the stimulus selection and construction can play an important role in producing the emotional boost on false memory reported in prior studies (Brainerd et al., 2008; Gallo et al., 2009).

Conclusion

Across three experiments, we found that when the study materials that were matched for thematic relations across both emotionally arousing and neutral, nonarousing items, emotion enhanced memory accuracy but did not increase memory distortion. The former finding of enhanced true memory is an important replication of the classic memory enhancement effect for emotional content that we observed when we controlled both the emotional and neutral stimuli for thematic relatedness. In the context of this replication, the absence of an emotional boost on false recognition provides theoretical insights into the mechanisms that underlie false-memory formation for emotional content. As we discussed earlier, findings in this domain have been mixed and have seemed to strongly depend on the types of stimuli used, as well as on the specific aspects of the stimuli that were salient during encoding (e.g., emotional distinctiveness, conceptual distinctiveness, or semantic relatedness). Increased false memory for emotional content (e.g., Gallo et al., 2009) has often been explained by the conceptual-relatedness hypothesis, that emotion increases the conceptual association across items and enables the process of emotion-based grouping, which in turn increases memory confusions among emotional stimuli. In the present study, we selected stimuli that were all thematically related to one another through categorical structures, to test whether the emotion-based grouping was powerful enough to override other forms of conceptual grouping.

The absence of an emotional effect on false recognition (Exps. 1 and 3) is important, in that it identifies conditions in which emotion fails as a grouping cue to override categorical information and increase confusability among items. Such emotion-based grouping did not exert an effect, even when the emotional aspects of the stimuli were salient at encoding (Exp. 3).

If anything, after a longer retention interval, emotional items showed a false-memory suppression effect, in that false recognition was lower for valenced than for neutral items (Exp. 2). A possible explanation for this pattern is that the distinctiveness of emotional pictures remained intact as emotional consolidation effects became relatively stronger across the longer retention interval, which in turn helped participants to use distinctiveness to suppress false recognition of nonstudied emotional words.

Overall, our findings converged across the encoding tasks that we used, to suggest that the influence of emotional valence on false memory is modulated by a boundary condition according to which valence itself does not eclipse other types of grouping, while the well-established memory enhancement effects of valence on true memory is still observed. Together with past research, the present findings specify the nuanced ways in which emotion shapes memory and the delimiting role of emotion in the development of false memory.