The study of memory is fundamental to our understanding of how humans interact with and make sense of the world around them. One of the most valuable aspects of memories is their ability to adapt with new information. Referred to as memory modification, this phenomenon occurs when a consolidated memory is reactivated with moderate prediction error, creating a need to modify the memory’s existing schema (Exton-McGuinness, Lee, & Reichelt, 2015; Gershman, Monfils, Norman, & Niv, 2017; Sevenster, Beckers, & Kindt, 2012). Thus, the memory becomes labile for a period following the reactivation (as review, see Agren, 2014; Alberini & LeDoux, 2013; Wang & Morris, 2010).

Intentional memory modification is understudied, especially in humans. Most existing work has focused on reducing memory accuracy by interfering with content while the memory is labile (Agren, 2014). However, much less is known about how memory reactivation affects the emotional quality of memories. Some work has been done to understand the modification of fear memories (cf. Agren et al., 2012; Gershman, Monfils, Norman, & Niv, 2017; Schiller et al., 2010), but these mechanisms are limited to conditioned threat or phobia reminder paradigms and have not been reliably generalized (see Kindt & Soeter, 2013). In clinical settings, a major goal of treating depression, anxiety, and trauma is to lessen the stressful nature of episodic memories without forgetting key details (LaBar, 2015). Memory modification may give therapists a unique window of opportunity to enduringly alter the emotional aspects of aversive memories without affecting other memory features (Kindt & van Emmerik, 2016; Lane, Ryan, Nadel, & Greenberg, 2015).

How can this regulatory goal be more broadly achieved? Theories of memory modification suggest that one key principle is to interfere with the relevant working memory processes required during reactivation (James et al., 2015). A common, noninvasive way to interrupt ongoing affective processes is to engage cognitive emotion regulation skills, which involve reappraising the meaning of a stimulus to alter emotional responses (Buhle et al., 2014; Ray, McRae, Ochsner, & Gross, 2010). However, it is unknown whether engaging cognitive reappraisal during memory reactivation will increase the efficacy of the regulation attempt. We tested this hypothesis by training healthy adults in one reappraisal tactic—spatial distancing. Distancing is effective across many subject populations, is readily trained, and can be applied to diverse affective elicitors (Powers & LaBar, 2019). The current study incorporated spatial distancing into a memory modification paradigm to investigate whether partial reactivation of aversive picture memories, followed by regulation, effectively reduced the emotional response to the pictures in the future. We expected spatial distancing following reactivation to have regulatory effects on both valence and arousal ratings (Davis, Gross, & Ochsner, 2011), while not altering recognition memory accuracy.

Materials and methods

Participants

One hundred and twenty-eight qualifying participants completed all three sessions of our study. An additional 47 participants failed to complete the experiment due to dropout, not meeting inclusionary criteria, or severe weather. Participants had no history of psychiatric or neurological disorders and were not currently taking any psychoactive medications. An intended sample size of 30 participants per group (total 120 participants) was set in advance, based on prior sample sizes from the memory modification literature (cf. Hupbach & Dorskind, 2014). We terminated recruitment once we estimated that our intended sample size would be reached by the end of the 3-day study. Nine participants were excluded due to incomplete data or statistical outliers (1–3 per group). Outliers were defined as those participants who had valence or arousal difference scores that were more than two standard deviations from their experimental group’s mean scores; these individuals were excluded from all analyses. This procedure was done to minimize the influence of these few outliers on the group-averaged statistical results and to eliminate individuals who may have misunderstood task instructions or executed the task incorrectly. We ended up one subject short from our intended sample size. Nonetheless, post hoc power analyses conducted in G*Power 3.1 revealed that for both the ANOVA and ANCOVA arousal analyses, this sample size was sufficient to provide an estimated power of over 95%. The 119 participants (mean = 19.10 ± 0.14 years, 60 female, 50% White, 29% Asian, 9% Black, 4% mixed race, 7% unidentified; 14% identified as Hispanic/Latino), were equally spread across our four experimental conditions with pseudorandom assignment (reactivation + regulation: n = 30; no regulation: n = 30; no reactivation: n = 29; neither: n = 30). Participants were recruited on a university recruitment website of undergraduate students taking psychology courses, and they gave written consent according to the requirements of the Duke University Campus Institutional Review Board. Participants received course credit for their time.

Stimuli

The study material consisted of 25 test images (20 negative and five positive), 25 control images (20 negative and five positive), and 12 additional new images (nine negative and three positive). The positive images were included to prevent prolonged negative mood induction but were not used in the analyses. The images were chosen from the International Affective Picture System (IAPS: Lang, Bradley & Cuthbert, 2008). All 50 test images and control images had differing thematic content, but the negative items from the two lists had similar average normative ratings of valence and arousal: valence, t(38) = 1.72, p = .09, test mean = 2.41 ± 0.47, control mean = 2.68 ± 0.52; arousal: t(38) = 1.47, p = .15, test mean = 5.96 ± 0.64, control mean = 5.67 ± 0.61. The new images were chosen to have similar thematic content to the test images and similar normative ratings of valence and arousal: valence: t(27) = 0.21, p = .83, new images mean = 2.37 ± 0.48; arousal: t(27) = 0.35, p = .73, new images mean = 5.87 ± 0.65. Fifteen additional negative images with higher normative ratings of valence and lower ratings of arousal were selected to use during practice/training sessions. The training stimuli were thus chosen to ease participants into the study and to avoid habituation of high arousal and low valence negative stimuli. To create partial reactivation pictures from the test, control, and practice images, a central emotional element in the scene was selected. Using the GNU Image Manipulation Program (GIMP; http://gimp.org), the chosen element was isolated and placed on a neutral gray rectangle with the dimensions of the original image (see Fig. 1 for an example). All pictures were presented centrally on a black screen. Stimuli were presented on a 19-inch monitor (~48 cm; approximately 30.5 cm × 38 cm). Participants sat approximately 60 cm from the screen.

Fig. 1
figure 1

Experimental design. The four groups of participants complete the same tasks for Sessions 1 and 3 but differ on their Session 2 task, which is specified by group in the associated table. The images used in this figure are representative examples and were not used in the actual study. War photo: Max Pixel (CC0). Snake: Glenn Bartolotti (CC BY-SA 4.0), from Wikimedia Commons

All groups saw the 25 test images in Session 1. Groups that completed a Session 2 saw the partially reactivated pictures from either the test (reactivation + regulation, no regulation) or control (no reactivation) image set. In Session 3, all groups saw the original test images and new images presented in a pseudorandom order.

Procedure

The study took part in three sessions, with subsequent sessions 48 hours apart. As much as possible, each participant came in at the same time of day for all three sessions. Participants were split into four groups: reactivation + regulation (experimental group), no reactivation (control for just regulation), no regulation (control for just reactivation), and neither (control for time). The experimental procedure is illustrated in Fig. 1.

In Session 1, all participants viewed the test images and rated them on valence and arousal. Each image was presented for 3 seconds, followed by valence (1 = unhappy to 9 = happy) and arousal (1 = calm to 9 = excited) ratings for 2.5 seconds each. A 4-second “Relax” screen served as the intertrial interval. Pictures were pseudorandomized such that no more than six negative images could be presented in a row.

Session 2 differed depending upon group. The reactivation + regulation group viewed emotional objects cut from the test pictures and were asked to regulate them. Specifically, the object would appear for 4 seconds, followed by a black screen with the cue “Far,” signifying to imagine it extremely far away from them for 8 seconds. This presentation order is altered from some emotion regulation paradigms that display the regulation instruction prior to stimulus presentation, but our ordering allowed participants to use the regulation technique as a manipulation on the reactivated memory, as is typical in memory modification paradigms. Immediately following the regulation attempt, participants rated their success at implementing the technique, from 1 (not at all successful) to 4 (very successful). Trials were separated with a “Relax” screen, and the stimuli were pseudorandomized as in Session 1. Before Session 2, reactivation + regulation group participants completed emotion regulation training to ensure they could effectively regulate. Participants were first instructed that when they saw the word “Far,” they were to imagine the object they just saw as physically far away from them in egocentric space, such as across a football field, in a different country, or in space. The experimenter clarified that the same location could be used for multiple or even all stimuli, as long as the participant could successfully perform the imagination. Participants then completed three untimed trials where, during the regulation task, they verbally described what they imagined to the experimenter to ensure task compliance. The experimenter gave feedback and further instruction during this time if necessary. If the participant did not have a solid grasp on the technique by the third trial, additional untimed trials were added until the participant could either accurately perform the technique or they reached eight trials, after which they were told they did not pass the training (all participants in our study passed training). Participants who were successful at learning the technique then completed seven additional training trials at the correct speed on their own.

The no-reactivation control group received the same emotion regulation training, but during Session 2 they saw emotional objects cut from the control images instead of the test images and proceeded to regulate these objects instead. The no-regulation control group was presented with the test objects during Session 2, but instead of regulating and rating success, they were simply asked to answer whether the object portrayed a human or not (for 2 seconds) as an active perceptual judgment control. Lastly, the neither control group skipped Session 2 entirely as a passive control for time between Session 1 and Session 3.

Ninety-six hours after the first session, all participants returned for Session 3. Regardless of group, participants were presented with the original test images intermixed with new images using the same pseudorandomization procedure as in Sessions 1 and 2. After viewing an image for 3 seconds, participants were asked to rate it on valence and arousal. They were then asked whether it was an old (test) image or a new image. Lastly, they had to indicate how confident they were of their old–new judgment, from 1 (guess) to 4 (certain). Trials were again separated by a “Relax” screen. Before the session, participants completed a two-trial practice minisession to familiarize them with the order and speed of the rating questions.

Questionnaires

We used the reappraisal and suppression subscales from the Emotion Regulation Questionnaire (ERQ; Gross & John, 2003) to account for individual differences in regulation strategy use prior to experimental training. The Social Desirability Scale (SDS17; Stöber, 2001) was used to test participants’ propensities to provide desired responses. These questionnaires were administered prior to Session 1. Lastly, before and after each session, participants filled out the Subjective Units of Distress Scale (SUDS; Wolpe, 1969) to assess distress level premanipulation and postmanipulation.

Electrodermal activity measurement

We measured phasic changes in electrodermal activity (EDA) to act as a gauge of sympathetic nervous system activity in response to our stimuli. Two Ag-AgCl electrodes with 11-mm diameter contact areas (BIOPAC Systems, Goleta, CA) were placed on the hypothenar eminence of the participant’s nondominant palm. K-Y Jelly (Reckitt Benckiser, Slough, England) was used as additional conductive gel.

The raw EDA signal was sampled at a frequency of 1 kHz, gain amplified at 10 μS/V. A 1 Hz high-pass filter was implemented through AcqKnowledge software (BIOPAC Systems, Goleta, CA). Trough-to-peak EDA measurements were extracted using the scoring system Autonomate (Green et al., 2014) such that EDA peaks beginning within a second of image onset up through 4 seconds poststimulus were considered valid responses.

Data analysis

Missed responses, recorded as zeroes in the data, were omitted from the analyses (on average, reactivation + regulation: 0.91 missed responses out of 20; no reactivation: 1.0 missed response; no regulation: 1.5 missed responses; neither: 0.94 missed responses). Participants’ changes in emotion ratings across the experiment were calculated by subtracting arousal and valence ratings at Session 1 from the same ratings at Session 3 to create difference scores. This process was done for all stimuli, as well as for the subset of stimuli that were reported as successfully regulated (success values of 3 or 4) from the reactivation + regulation group. Rating difference scores were analyzed using univariate analyses of variance (ANOVA) with an alpha of 0.05. The reappraisal and suppression subscores on the ERQ and the SDS17 score were included as covariates to the full stimuli model. Results were corrected for multiple hypothesis testing used Holm’s method, and effect sizes for t tests were calculated using Hedges’s gs (Lakens, 2013). Additional post hoc analyses were conducted to test whether an individual’s trait reappraisal, suppression, or social desirability scores could explain the variation in the reactivation + regulation group’s arousal scores.

Accuracy (discrimination) measurements were calculated by taking the proportion of negative test pictures correctly identified as old (% of hits) and subtracting out the proportion of negative new pictures incorrectly identified as old (% of false alarms). A univariate ANOVA was used to compare accuracy measurements across groups.

EDA trough-to-peak nonzero values were counted to classify participants as responders versus nonresponders. Participants who responded to over 10% of the stimuli in a session were considered responders for that session. If a participant was coded as a responder for both their Session 1 and 3, their average peak-to-trough values from Session 1 were subtracted from the Session 3 averages to create EDA difference scores. Unfortunately, we did not have enough viable EDA difference scores per group to run significance tests (specifically, reactivation + regulation: two people; no reactivation: four people; no regulation: three people; neither: three people), nor did we have enough responders to compare across groups for only Session 3 (reactivation + regulation: seven people; no reactivation: five people; no regulation: nine people; neither: four people). Thus, these data are excluded from the results.

Results

Arousal rating change from Session 1 to Session 3 significantly differed across groups, F(3, 115) = 7.34, p < .001, \( {\eta}_p^2 \) = 0.16 (see Fig. 2). Post hoc t tests revealed that the reactivation + regulation group (M = −1.32, SD = 1.23) had greater between-session arousal reduction relative to all three control groups: no reactivation (M = −0.68, SD = 0.91), t(57) = −2.29, p = .026, 95% CI [−1.21, −0.08], Hedges’s gs = 0.59; no regulation (M = −0.62, SD = 0.82), t(58) = −2.62, p = .022, 95% CI [−1.25, −0.17], Hedges’s gs = 0.67; neither (M = −0.23, SD = 0.59), t(58) = −4.39, p < .001, 95% CI [−1.59, −0.59], Hedges’s gs = 1.12. Changes in valence did not differ significantly across groups, F(3, 115) = 0.63, p = .595, \( {\eta}_p^2 \) = 0.02.

Fig. 2
figure 2

Emotional difference scores per group.a Top: Violin plots displaying the distribution of arousal difference scores (Session 3 − Session 1) per group. More negative numbers represent greater alleviations of arousal across the experiment. Bottom: Violin plots showing valence difference scores (Session 3 − Session 1) across groups, where larger numbers signify more positive valence scores at Session 3. Points within each violin represent the mean of the distribution, and error bars represent 95% confidence intervals around the mean. b Same results as depicted in a, but in bar plots. Top: Arousal difference score, Bottom: Valence difference score. Here, error bars represent standard error around the mean. Significance is depicted by square brackets above groups. *p < .05. **p < .01. ***p < .001. (Color figure online)

The above analyses included all stimuli. The results remained the same when only the stimuli that were successfully regulated (success values of 3 or 4) were included in the reactivation + regulation group: arousal, F(3, 115) = 6.29, p < .001, \( {\eta}_p^2 \) = 0.14; valence: F(3, 115) = 0.17, p = .915, \( {\eta}_p^2 \) < 0.01. The majority of stimuli were successfully regulated (M = 14.03 images out of 20, SD = 3.90). Thus, all future analyses are only reported for the full stimulus set. Similarly, results remained the same after accounting for participant gender and experiment time of day (see Supplementary Material), and so these measures were not included as covariates in future analyses.

The groups did not differ in baseline reappraisal use, F(3, 115) = 0.04, p = .987, η2 < .01; suppression use, F(3, 115) = 0.66, p = .580, η2 = 0.02; or social desirability, F(3, 115) = 1.79, p = .153, η2 = 0.05. Regardless, these measures were added into the regression model to act as covariates. The effect of group on arousal differences across the experiment remained significant after controlling for these measures, F(6, 112) = 3.67, p = .002, \( {\eta}_p^2 \) = 0.16. Furthermore, post hoc analyses revealed a mild positive correlation between reappraisal use and arousal score change, r(28) = 0.33, p = .075, where participants who reported greater baseline usage of reappraisal showed less of a reduction in arousal score than those who did not use reappraisal as often.

There was also a significant difference in accuracy across groups, F(3, 115) = 3.19, p = .026, \( {\eta}_p^2 \) = 0.08. This result was apparently driven by greater accuracy for the reactivation + regulation group (M = 0.88, SD = 0.12) and the no-regulation group (M = 0.88, SD = 0.12) than for the no-reactivation group (M = 0.82, SD = 0.12) and the neither group (M = 0.80, SD = 0.12). We note that the former two groups viewed the test images (or parts of it) three times, whereas the latter groups only saw each image twice, which may account for the numeric differences across the groups. However, no post hoc pairwise comparisons survived multiple comparisons correction (all ps > .1). Furthermore, accuracy did not correlate with change in arousal across the experiment (r = −.007, p = .936).

Discussion

Using a combined emotion regulation and memory reactivation paradigm, the present study sought to determine whether a cognitive reappraisal tactic—spatial distancing—would effectively reduce long-term emotional reactivity to consolidated aversive memories when conducted during a period of memory lability. Supporting the main hypothesis, participants who voluntarily implemented distancing immediately after memory reactivation reported less emotional arousal to the memories 2 days later compared with control groups who did not regulate, did not reactivate, or did nothing in the interim. Thus, the combination of reactivating and regulating the consolidated emotional memory yielded the strongest subsequent arousal reduction.

The self-reported arousal reduction occurred despite participants’ continued endorsement of the memories as negatively valent. Although prior work has shown that distancing can modulate both arousal and valence ratings when regulation is conducted during initial encoding (Davis et al., 2011), here, we found that arousal is more sensitive to the impact of distancing on reactivated memories. Neurobiological studies have suggested that cognitive reappraisal reduces emotional arousal through prefrontal cortical down regulation of amygdala activity (Buhle et al., 2014; Ochsner & Gross, 2008). Future studies employing neurofeedback or neurostimulation tools could determine whether targeting these pathways during periods of memory lability provides a novel means of long-term arousal reduction for reactivated emotional memories.

Importantly, the subsequent affective impact of spatial distancing on consolidated memories was not accompanied by overall memory accuracy impairment compared with the control groups. Distancing has been postulated to create more abstract representations of stimuli (Trope & Liberman, 2010), which could, in turn, hinder detailed memory reconstruction and lead to arousal reduction as a secondary outcome. This explanation of the main findings does not appear to hold, although we note that we only tested overall recognition accuracy in the present study. Instead, the results parallel those seen in some memory reactivation studies of fear conditioning, in which changes in subsequent affective responding can occur in the absence of altered declarative memory of the stimulus reinforcement contingencies (Fitzgerald, Seemann, & Maren, 2014).

Future research is needed to determine the extent to which the present findings could translate into clinical settings. Attempts to therapeutically adapt memory reactivation paradigms from the fear conditioning literature have met with limited success, in part due to the numerous boundary conditions on the experimental paradigm itself (Treanor, Brown, Rissman, & Craske, 2017), such as differences in strength and age of the memory, duration of reactivation and/or manipulation, and anxiety level of the participant. Here we provide proof of concept for a different strategy to combine memory reactivation and emotion regulation in a way that may selectively reduce arousal to consolidated declarative emotional memories. Given the strength and breadth of emotion regulation tactics, we believe this novel combination may effectively treat old and new memories across all populations in a way that can prove effective for a variety of applications. If supported by further development and translation, the approach introduced here may yield a novel behavioral tool to help therapists target a partial, accessible aspect of a client’s memories to induce lasting affective change through the principles of regulatory memory modification.