False memory refers to memory for events that never actually happened. Although most instances of everyday false memory are innocuous, memory distortions can have serious consequences in some high-stakes situations in which memory accuracy is paramount, such as eyewitness identifications, police interrogations, and medical histories given during emergency treatments. An important objective of false memory research is to produce empirical findings and validate theoretical principles that can be applied to these high-stakes situations. For instance, police interrogation (e.g., Kassin et al. 2010), eyewitness interviews and identifications (e.g., Bruck and Ceci 1999; Poole and Lamb 1998; Wixted and Wels 2017), and psychotherapy (e.g., Lindsay and Read 1994; Lynn et al. 2003) have been the foci of such work.

A hallmark of high-stakes remembering situations is that they generally involve emotion, either in the sense that the contexts are inherently emotional (e.g., police interviews) or in the sense that the events being remembered are emotional (e.g., violent crimes) or both (Bookbinder and Brainerd 2016; Brainerd and Reyna 2005). Naturally, that has stimulated research interest in how the emotional content of experience affects false memory.

Mixed findings in emotion-false memory research

In order to manipulate emotional content in experimentation, it is first necessary to agree upon the psychological dimensions of emotional content. Since Wundt (1912), it has been widely thought that at least two types of information are processed when we perceive emotional content in target materials—valence (positivity-negativity) and arousal (calming-exciting). In contemporary psychology, this idea was formalized in the circumplex model of emotion (Posner et al. 2005; Russell 1980), which posits that emotional experiences are mixtures of different values of valence and arousal.

Consequently, emotion-false memory research has revolved around the question of how false memory is influenced by variations in the valence of target material and, to a far lesser extent, by variations in its arousal. Thus far, however, experimental findings about the valence effects on false memory remain elusive (for a review, see Bookbinder and Brainerd 2016). On the one hand, the results of some studies suggest that false memory levels are higher for emotionally valenced (positive/negative) materials than for comparable neutral materials (Bookbinder and Brainerd 2017; Brueckner and Moritz 2009; Gallo et al. 2009; Sharkawy et al. 2008). On the other hand, the results of other studies suggest that false memory levels are lower for emotionally valenced (positive/negative) materials than for neutral materials (Choi et al. 2013; Howe et al. 2010; Palmer and Dodson 2009; Pesta et al. 2001).

The exact causes of these inconsistent outcomes are not well understood, but a lack of methodological standardization across experiments is one likely contributor. A helpful first step in methodological standardization would be to develop a simple, uniform procedure that is easy to implement in a broad range of situations. An obvious candidate is a list-learning procedure that was originally devised by Deese (1959), which is currently the most widely used methodology in false memory research (Gallo 2006). In this procedure, subjects study lists of 12-15 related words (e.g., bed, rest, awake, tired, dream, …). All of the list words are forward associates of a word that is not presented on the list (sleep), which is called the critical distractor or critical lure. Deese (1959) found that the critical distractors were falsely remembered at surprisingly high levels on immediate free recall tests. More than three decades later, Roediger and McDermott (1995) replicated the free recall results and found that the critical distractors were also falsely remembered at high levels on recognition tests.

Early DRM experiments in emotion-false memory research

In an early study, Budson et al. (2006) adapted the Deese/Roediger/McDermott (DRM; Deese 1959; Roediger and McDermott 1995) procedure by creating a set of emotional (negative) DRM lists, along with a parallel set of neutral DRM lists. The procedure of selecting words that are forward associates of a critical distractor was preserved, but the words on Budson et al.’s emotional lists had clear affective content (e.g., the sick list includes cough, fever, ill,…), whereas those on their neutral lists did not (e.g., the chair list includes table, sit, desk,…). Budson et al. found that false recognition of critical distractors was equivalent between the emotional and neutral lists. However, using a subset of Budson et al.’s lists, Howe (2007) and Howe et al. (2010) found that emotional lists produced higher false recognition and lower false recall than neutral lists. Sharkawy et al. (2008) replicated Howe et al.’s finding for false recognition, but they found no difference in false recall between the two types of lists.

To understand these mixed findings, it is important to note that the Budson et al. (2006) methodology has two limitations. First, some lists in the neutral pool are not truly neutral, as their critical distractors receive highly positive ratings in norms for emotional words (e.g., soft, girl and sweet; Bookbinder and Brainerd 2016). According to Warriner et al.’s (2013; WKB) word norms, among Budson et al.’s 10 neutral critical distractors, 6 received mean valence ratings > 7 on a 9-point unhappy ➔ happy scale. Thus, although Budson et al.’s (2006) emotional lists are clearly negative, their neutral lists are actually a mixture of neutral and positive words. The other limitation is that the emotional and neutral list pools confound valence with arousal. As Brainerd et al. (2008a) pointed out, although there is a valence difference such that the words on emotional lists are on average more negative than the words on neutral lists, the words on emotional lists are also on average more arousing than those on neutral lists. According to the WKB norms, the mean valence ratings (on a 9-point unhappy ➔ happy scale) of Budson et al.’s (2006) emotional and neutral critical distractors are 2.69 and 6.47, respectively, and the corresponding mean arousal ratings (on a 9-point calm ➔ exciting scale) are 5.59 and 3.48, respectively. Therefore, it is uncertain whether the memory effects that were produced by the two types of lists were due to the valence difference, the arousal difference, both, and/or the interaction between them.

Cornell/Cortland emotional lists

In order to overcome the limitations discussed above, Brainerd et al. (2008b) developed a new pool of emotional DRM lists, the Cornell/Cortland Emotional Lists (CEL; see Table 1). The CEL consists of 32 DRM lists, which are subdivided into four sets of lists: eight negative/high arousal lists, eight negative/low arousal lists, eight positive/high arousal lists and eight positive/low arousal lists. The CEL was initially constructed using the Nelson et al.’s (2004) norms of word association and the affective norms for English words (ANEW; Bradley and Lang 1999). The Nelson norms were used to isolate groups of words whose mean backward associative strength (MBAS; the mean probability for the list words to elicit the critical distractor as forward associates) is sufficiently high. This is critical for generating significant levels of false memory, because MBAS is the best-known predictor of the level of false memory that a DRM list induces (Hutchison and Balota 2005; Roediger et al. 2001). As shown in Table 2, the MBAS values of the four sets of lists are .26, .23, .21, .26, respectively, which are adequate to produce reliable levels of false recognition and false recall according to existing DRM norms (Roediger et al. 2001). The ANEW norms were used to identify words that vary in valence and arousal. When the lists were first constructed, many of the words had not been rated in the ANEW norms, and hence, their valence and arousal ratings were unknown. Eventually, however, Warriner et al. (2013) published a follow-up of the ANEW norms that included 13 times as many words. We have taken advantage of the Warriner et al. (WKB) norms to identify the mean valence and arousal scores for nearly all of the CEL list words and all of the critical distractors in Table 1.

Table 1 The 32 Cornell/Cortland emotional lists and their critical distractors with mean valence, standard deviation valence, mean arousal, standard deviation arousal and backward association strength (BAS)
Table 2 Valence-arousal counterbalancing and BAS of the Cornell/Cortland emotional lists

The most important feature of the CEL is that the valence and arousal scores for both critical distractors and list words are counterbalanced across the four sets of lists (see Table 2), allowing researchers to disentangle the effects of valence and arousal on both true memory (recall/recognition of list words) and false memory (recall/recognition of critical distractors). The 2 (valence: positive, negative) × 2 (arousal: high, low) analyses of variance (ANOVAs) revealed that the mean valence of the list words is lower for the 16 negative lists than for the 16 positive lists, F(1, 28) = 42.72, MSE = .46, p < .0001, partial η2 = .60. The mean arousal of the list words is higher for the 16 high arousal lists than for the16 low arousal lists, F(1, 28) = 18.24, MSE = .17, p = .0002, partial η2 = .39. Also, the mean valence of the critical distractors is lower for the 16 negative lists than for the 16 positive lists, F(1, 28) = 102.25, MSE = .91, p < .0001, partial η2 = .79. The mean arousal of the critical distractors is higher for the 16 high arousal lists than for the 16 low arousal lists, F(1, 28) = 37.36, MSE = .50, p < .0001, partial η2 = .57.

Emotional ambiguity hypothesis

Recent research on the cognitive effects of the valence and arousal has stressed the importance of variability in items’ average valence and arousal intensity ratings. Specifically, according to the emotional-ambiguity hypothesis (Brainerd 2018; Mattek et al. 2017), the relation between valence and arousal intensity and their effects on other psychological processes depend upon a variable that has been neglected in prior research: valence ambiguity. Valence ambiguity is the level of uncertainty in people’s subjective perception of an item’s valence, which can be measured with the standard deviation (SD) of items’ valence ratings in normed materials. Extensive data, using different stimuli (e.g., words, pictures), have demonstrated that (a) items’ average valence intensity ratings are unrelated to the uncertainty of those ratings (Brainerd 2018), and (b) valence and arousal intensity ratings are strongly related (arousal ratings increase as valence ratings increase) only when the uncertainty of valence ratings is low (Brainerd 2018; Brainerd and Bookbinder 2019; Mattek et al. 2017). Thus, it is possible that the ambiguity of emotional content, as well as its intensity, determines how memory is affected by emotional content.

The fact that the emotional ambiguity hypothesis and related findings are so recent means that ambiguity of valence and arousal ratings has not figured in prior emotion-false memory experiments. In order to encourage such research, apart from mean valence and arousal intensity ratings, the descriptive data of the CEL in Table 1 also includes SDs of those ratings.

The present study

The purpose of the present study was to generate norming data for the CEL that investigators can use to select lists for emotion-false memory experiments. Using procedures that are very similar to those used in norming studies for other pools of DRM lists (e.g., Roediger et al. 2001; Stadler et al. 1999), we collected data from three different universities and tested both recall and recognition of all 32 CEL lists. All subjects were presented with 16 of the 32 lists, followed by free recall and then recognition tests. Hence, variability in true recall/recognition and false recall/recognition could all be tracked as a function of the valence and arousal of list words and critical distractors.

Method

Subjects

The sample consisted of 228 undergraduates in total—109 undergraduates (73 females and 36 males) were recruited from University A, which is a private university in the northeastern US, and 57 undergraduates (30 females and 27 males) were recruited from University B, which is a public university in the northeastern US. Sixty-two undergraduates (32 females and 30 males) were recruited from University C, which is a public university in the southeastern US. The subjects at all three universities participated in order to fulfill a course requirement. All subjects were native English speakers. The data were gathered over a 3-year period, with the subjects in the different universities participating during different academic years.

Materials

The materials were the 32 CEL lists shown in Table 1. The critical distractor and list words for each list as well as the mean valence, SD valence, mean arousal, SD arousal, and BAS data for each word are displayed in Table 1. As mentioned, the valence and arousal ratings were drawn from the WKB norms (Warriner et al. 2013), and the BAS data were drawn from the Nelson et al.’s (2004) norms of word association. The 32 lists were evenly divided into negative/high arousal lists, negative/low arousal lists, positive/high arousal lists and positive/low arousal lists (see Table 2).

Procedure

Overall, the procedure resembled the DRM norming methodology of Stadler et al. (1999) and Roediger et al. (2001). Subjects were tested in small groups of 4–5 individuals. Before the experiment began, all subjects were provided with a recall booklet. For each subject, sixteen 15-word DRM lists were presented via computer recordings. The words on individual lists were presented at a 2-s rate in a neutral voice, with a 2-s interval between consecutive words. The 16 lists that were administered to individual subjects were counterbalanced across subjects, so that each list was presented to an equal number of subjects. After each list was presented, subjects performed written free recall for that list before the next list was presented. Subjects were given 90 s to recall each list, and they were told to stop writing and turn to the next page of the recall booklet when time was up. Following the presentation of all 16 lists, subjects worked on math problems for 1 min as a buffer task. After that, they completed a self-paced old/new recognition test with a scantron. There were 74 test items on the recognition test, including 48 targets (three list words from each list, taken from the 5th, 7th, and 9th list presentation positions), 16 critical distractors (one critical distractor from each list), and ten unrelated distractors (randomly selected from unpresented lists). For each test item, subjects were instructed to bubble “a” if it was an old word and bubble “b” if it was a new word.

Results

The descriptive data for recall and recognition are summarized in Table 3 and Table 4, respectively. In Table 3, true and false recall results from the three universities (A, B, and C) and the grand means are reported. The true recall level was calculated as the number of targets recalled divided by the total number of targets, and the false recall level was calculated as the number of critical distractors recalled divided by the total number of critical distractors. The recall data for the spider list and the fall list are missing from the University A protocols because these two lists were inadvertently not presented to those subjects.

Table 3 Mean true and false recall
Table 4 Mean bias-corrected true and false recognition

In Table 4, true recognition was measured by the bias-corrected acceptance rate for targets, and false recognition was measured by the bias-corrected acceptance rate for critical distractors. Here, we used the two-high-threshold statistic Pr (Snodgrass and Corwin 1988) for bias correction, which is calculated by subtracting false alarm rates for unrelated distractors from hit rates for targets and from false alarm rates for critical distractors. The University A false recognition data are missing for the Pretty list because its critical distractor had been presented as part of another list during the study phase. For the same reason, the University C false recognition data are missing for the Hurt list, Spider list, Sick list and Fall list. Importantly, ample false recognition data for these items are available from the other two universities.

We conducted 2 (valence: positive, negative) × 2 (arousal: high, low) repeated measures analyses of variance (ANOVAs), using false recall, false recognition, true recall, and true recognition as dependent variables, respectively. In addition, we conducted post hoc tests using Tukey’s honest significant difference test (HSD). The ANOVAs and post hoc analyses were conducted for the aggregated data of the three universities and separately for the University A, B, and C data, using the lmerTest package in R (Kuznetsova et al. 2017). In what follows, we first report the false recall and false recognition results as emotion-false memory is the topic of primary interest. Next, we report the true recall and true recognition results. All results that are reported below were significant at or beyond the .05 level of confidence.

False recall

Aggregated results

The aggregated results for the three universities produced a valence main effect, F(1, 684) = 51.27, MSE = .04, p < .0001, partial η2 = .07, and a Valence × Arousal interaction, F(1, 684) = 36.38, MSE = .04, p < .0001, partial η2 = .05. According to the post hoc tests, the false recall levels were higher for negative lists than for positive lists. The interaction indicated that negative/high arousal lists elicited more false recall than positive/high arousal lists, p < .0001, but false recall levels for negative/low arousal lists and that for positive/low arousal lists did not differ.

University A

There was a main effect of valence, F(1, 331) = 28.49, MSE = .04, p < .0001, partial η2 = .08, and a Valence × Arousal interaction, F(1, 331) = 12.62, MSE = .04, p = .0004, partial η2 = .04. Post hoc tests suggested that false recall was higher for negative lists than for positive lists. In addition, the valence effect was modulated by arousal level. False recall was higher for negative/high arousal lists than for positive/high arousal lists, p < .0001, but did not differ between negative/low arousal lists and positive/low arousal lists.

University B

A main effect of valence was found, F(1, 165) = 6.42, MSE = .05, p = .01, partial η2 = .04, as well as a Valence × Arousal interaction, F(1, 165) = 15.33, MSE = .05, p = .0001, partial η2 = .09. False recall was higher for negative lists than for positive lists, p = .01. Also, false recall was higher for negative/high arousal lists than for positive/high arousal lists, p = .0005, but did not differ between negative/low arousal lists and positive/low arousal lists. Thus, the qualitative patterns for false recall were the same for Universities A and B.

University C

There was a valence main effect, F(1, 183) = 18.32, MSE = .04, p < .0001, partial η2 = .09, and a Valence × Arousal interaction, F(1, 183) = 9.21, MSE = .04, p = .003, partial η2 = .05. Negative lists elevated false recall compared to positive lists. Also, the difference in false recall was only reliable between negative/high arousal lists and positive/high arousal lists, p < .0001. Thus, the qualitative patterns for the valence and arousal effects on false recall were the same in subjects from all three universities.

False recognition

Aggregated results

The aggregated results for the three universities yielded a valence main effect, F(1, 623) = 59.92, MSE = .06, p < .0001, partial η2 = .09, such that false recognition was higher for negative lists than for positive lists. No arousal effect or Valence × Arousal interaction was observed. Thus, negative valence increased false recognition regardless of arousal level.

University A

The University A data produced a valence main effect, F(1, 264) = 29.10, MSE = .07, p < .0001, partial η2 = .10, and a Valence × Arousal interaction, F(1, 264) = 6.14, MSE = .07, p = .02, partial η2 = .02. According to the post hoc tests, negative lists produced more false recognition than positive lists, p < .0001. In addition, negative/high arousal lists produced higher false recognition than positive/high arousal lists, p < .0001, but negative/low arousal lists and positive/low arousal lists did not differ.

University B

There was a valence main effect, F(1, 168) = 4.20, MSE = .05, p = .04, partial η2 = .002, such that false recognition of critical distractors was higher for negative lists than for positive lists. However, unlike University A, there was no reliable Valence × Arousal interaction.

University C

The results for University C were the same as those for University B. We found a valence main effect, F(1, 183) = 26.46, MSE = .06, p < .0001, partial η2 = .13, such that false recognition was higher for negative lists than for positive lists, but no Valence × Arousal interaction. Taken together, then, the data of Universities B and C provide evidence that negative valence can increase false recognition even when arousal is low.

True recall

Aggregated results

Concerning true recall, the aggregated results for the three universities produced a valence main effect, F(1, 684) = 141.50, MSE = .004, p < .0001, partial η2 = .17; an arousal main effect, F(1, 684) = 26.05, MSE = .004, p < .0001, partial η2 = .04; and a Valence × Arousal interaction, F(1, 684) = 64.49, MSE = .004, p < .0001, partial η2 = .09. Post hoc tests revealed that positive lists produced more true recall than negative lists, and high arousal lists produced more true recall than low arousal lists. In addition, although the difference in true recall between positive/high arousal lists and negative/high arousal lists (p < .0001) was much larger than the corresponding difference between positive/low arousal lists and negative/low arousal lists (p = .03), both differences were reliable.

University A

There was a main effect for valence, F(1, 331) = 117.77, MSE = .003, p < .0001, partial η2 = .26; a main effect for arousal, F(1, 331) = 7.30, MSE = .003, p = .007, partial η2 = .02; and a Valence × Arousal interaction, F(1, 331) = 84.56, MSE = .003, p < .0001, partial η2 = .20. The post hoc tests revealed that positive lists produced more true recall than negative lists, and true recall was better for high arousal lists than for low arousal lists. In addition, the positive/high arousal lists produced more true recall than the negative/high arousal lists, p < .0001, but the positive/low arousal lists and negative/low arousal lists did not differ.

University B

We again found a main effect for valence, F(1, 165) = 25.03, MSE = .003, p < .0001, partial η2 = .13, and a main effect for arousal, F(1, 165) = 15.72, MSE = .003, p = .0001, partial η2 = .09, but no interaction between them. The reasons for the main effect were the same as in University A subjects: True recall was better for positive lists than for negative lists, and was better for high arousal lists than for low arousal lists. The absence of an interaction provided the first evidence of valence effects on recall that do not depend on level of arousal.

University C

These results were the same as those for University A subjects. There were a valence main effect, F(1, 183) = 21.25, MSE = .005, p < .0001, partial η2 = .10, an arousal main effect, F(1, 183) = 7.31, MSE = .005, p = .008, partial η2 = .04, and a Valence × Arousal interaction, F(1, 183) = 7.98, MSE = .005, p = .005, partial η2 = .04. True recall was better for positive lists than for negative lists, and better for high arousal lists than for low arousal lists. The valence effect was moderated by the arousal, with the difference in true recall only being reliable between positive/high arousal lists and negative/high arousal lists, p < .0001.

True recognition

Aggregated results

When the true recognition data of the three subject samples were aggregated, the results showed that there was a valence main effect, F(1, 623) = 76.37, MSE = .02, p < .0001, partial η2 = .11, and neither an arousal main effect nor an interaction between arousal and valence were observed. Thus, true recognition was better for positive lists than for negative lists, and arousal neither directly influenced true recognition nor moderated the valence effect.

University A

We found only a main effect of valence, F(1, 264) = 9.72, MSE = .02, p = .002, partial η2 = .04. Similar to recall, the positive lists produced more true recognition than negative lists, but unlike recall, there was no arousal main effect, and the valence effect did not depend on the level of arousal.

University B

Similar to the results from University A, only a valence main effect was obtained, F(1, 168) = 57.69, MSE = .02, p < .0001, partial η2 = .26. True recognition was again better for positive lists than for negative lists.

University C

The results from University C displayed the same pattern as those from Universities A and B. There was a valence main effect, F(1, 183) = 17.37, MSE = .02, p < .0001, partial η2 = .09, but no arousal effect or interaction. True recognition was again better for positive lists than for negative lists.

Discussion

Our general aim in this article has been to advance methodological standardization across studies of how emotional content influences false memory by providing a set of emotional DRM lists (the CEL; Table 1) that have been normed on three subject samples. The valence and arousal levels of the critical distractors and list words vary systematically over the lists. This feature allows investigators to determine how these properties of emotional content influence true and false memory under a variety of theoretically motivated conditions—for instance, immediate versus delayed testing, surface versus semantic encoding, and intentional versus incidental learning, and across a number of important individual difference dimensions—for instance, age, attitude, clinical diagnoses, cognitive ability, and personality.

In that connection, the advantage of studying emotion-false memory effects with DRM-type emotional lists is that the DRM procedure has key methodological strengths. Specifically, it is able to produce robust false memory with a procedure that is both very simple and highly adaptable. Concerning simplicity, as we saw in the present norming study, only a brief induction phase, in which subjects study some short word lists, is required to produce reliable levels of false recall and false recognition. Indeed, under certain circumstances, false memories can be detected even only a few seconds after the presentation of a single four-word DRM list (Abadie and Camos 2019; Atkins and Reuter-Lorenz 2011). With respect to adaptability, the DRM procedure is so flexible that it can be adjusted to meet the requirements of virtually any type of study. Crucially, this includes the restrictive requirements of fMRI studies (Dennis et al. 2012; Kurkela and Dennis 2016) and studies of cognitively impaired populations (Brainerd et al. 2006; Budson et al. 2006).

The valence, arousal and BAS ratings for the critical distractors and list words of the CEL are reported in Table 1, with the counterbalancing of valence and arousal illustrated in Table 2. The recall and recognition data for the CEL are summarized in Tables 3 and 4, respectively. When the CEL was initially developed, the mean valence and arousal ratings for many words were missing, because they were not rated in the ANEW (Bradley and Lang 1999) norm. Accordingly, in Table 1, we have provided mean valence and arousal ratings for the CEL words using the WKB (Warriner et al. 2013) norm instead, which contained as 13 times many words as the ANEW norms. Besides the mean valence and arousal intensity ratings, we have also included SDs for these ratings in Table 1. According to the emotional-ambiguity hypothesis (Brainerd 2018; Mattek et al. 2017), the effects of valence and arousal on psychological processes are moderated by uncertainty in subjective valence ratings, which can be estimated with SDs. However, as this hypothesis was proposed so recently, prior emotion-memory studies have not included the ambiguity factor. The variability data in the CEL norms allow investigators either to manipulate ambiguity factorially or to include it as a covariate when either valence or arousal is manipulated.

By far the most important feature of the norming results is the level of consistency in the effects that were observed for the three subject samples. A major criticism of the extant emotion-false memory literature is the inconsistent findings for valence and arousal manipulations across experiments (see Bookbinder and Brainerd 2016). However, in the current study, the analyses of the false recall, false recognition, true recall, and true recognition data for the individual universities produced the same qualitative patterns, with only two minor exceptions. Those exceptions involved the false recognition data and the true recall data. In false recognition, a reliable Valence × Arousal interaction was obtained for University A but not for University B, University C or the aggregated data. In true recall, a reliable Valence × Arousal interaction was obtained for University A, University C, and the aggregated data but not for University B. Considering the number of ANOVAs that were conducted, it is not surprising, from a statistical point of view, that two minor findings varied among the subject samples.

Overall, false recall and false recognition were more marked for negative lists than for positive lists (see Tables 3 and 4). Neither type of false memory varied consistently as a function of arousal. These results echo Brainerd and Bookbinder’s (2019) finding that compared to arousal, valence is more strongly correlated with semantic properties that reliably predict false memory (e.g., meaningfulness, familiarity). Also, the effects of valence on false memory were moderated by arousal for false recall but not false recognition: The difference between negative versus positive valence in false recall widened as arousal increased. Notably, valence affected true memory differently than it affected false memory. In particular, true recall and true recognition were better with positive lists than with negative lists, which is consistent with other studies that have factorially manipulated valence and arousal (Brainerd et al. 2010; Gomes et al. 2013; Libkuman et al. 2004). Empirically, this is a classic example of a double dissociation, wherein a manipulation has opposite effects on different performance measures. The specific pattern of opposite valence effects for true and false memory is predicted by fuzz-trace theory (FTT)’s account of emotion-false memory effects. According to that account (see Bookbinder and Brainerd 2016; Brainerd and Bookbinder 2019), negative content enhances gist memory (memory for items’ semantic content and other relational information, which is the primary process that underlies false recall and false recognition), but positive content enhances verbatim memory (memory for item-specific surface details, which is the primary process that underlies true recall and true recognition).

The fact that for both recall and recognition, negative lists produced more false memory than positive lists is consistent with some prior studies that used DRM-type materials (Brainerd et al. 2008a; Howe 2007; Howe et al. 2010; Sharkawy et al. 2008). These results have important implications in law and other real-life settings, where it is often claimed that memories of negative events are especially well preserved and highly resistant to distortion (Laney and Loftus 2010). It is worth mentioning that in most of the prior emotional DRM studies, valence was confounded with arousal, which we managed to avoid in the current study. Importantly, we observed a robust interaction between valence and arousal for false recall, but not for false recognition: The valence effect on false recall was much larger when arousal was high than when it was low. Based on Bookbinder and Brainerd’s (2016) review, that result seems to be new to the CEL. Why does arousal modulate the valence effect in false recall but not in false recognition? One possibility lies in the difference between recall and recognition tests that the former is more sensitive to verbatim memory, whereas the latter is more sensitive to gist memory (Seamon et al. 2003). As FTT predicts, the arousal effect is predominantly rooted in verbatim memory (Brainerd and Bookbinder 2019). Specifically, verbatim memory should improve as arousal increases from low to moderate but decline as arousal increases from moderate to high (Bookbinder 2017). Thus, owing to the fact that recall tests favor verbatim memory more than recognition tests, it is expected that arousal should have a stronger effect on recall than on recognition.

Continuing with the detailed norming results, we found for both recall and recognition that positive content elevated true memory relative to negative content. In addition, arousal moderated the valence effect in recall but not in recognition, just as it did in false memory, and it also had a main effect. High arousal lists produced more true recall than low arousal lists, and there was an interaction such that the valence effect was more marked for high arousal lists. Again, the arousal modulation effect was only found for recall but not recognition, and the difference in sensitivity to verbatim memory between recall and recognition tests may be responsible. If the arousal effect is primarily verbatim-based, as FTT expects, it would be more likely to show up in recall tests than in recognition tests.

It should be noted that the differences between recall and recognition that were detected in the CEL norming data are not without precedent in the emotion-false memory literature. Two points that were emphasized in Bookbinder and Brainerd’s (2016) review are that across experiments in this literature, (a) recall experiments have produced different emotion-false memory effects than recognition experiments, and (b) recognition effects have been more consistent than recall effects. However, as these authors also noted, it is difficult to conclude anything more than that because variations in valence were confounded with variations in arousal in all studies. That removal of that persistent confound is one of the great advantages of the CEL. Based on our norming results, recall and recognition do indeed produce different emotion-false memory effects with DRM-type tasks, and they also seem to produce different emotion-true memory effects.

Summing up, the current study provides norming data for 32 lists of which all list words and critical distractors have been normed for perceived valence (positive-negative) and perceived arousal (calm-exciting). These lists have been normed for the levels of true recall, true recognition, false recall, and false recognition that they generate. The most robust patterns are that true memory increased and false memory decreased when the content of list words or critical distractors were positive compared to when they were negative. These patterns hold for both recall and recognition in the norming data, for all subject samples. Another consistent pattern is the Valence × Arousal interactions in recall, such that high arousal always amplifies the effects of valence. Because the valence and arousal of target materials have been so routinely confounded in prior research, a key point about this pattern is that it emerged from factorial manipulations of valence and arousal, which the CEL makes possible.