Conversations we have with others routinely involve remembering previously experienced events, from everyday occurrences (Marsh & Tversky, 2004), to extraordinary personal incidents (Harber & Cohen, 2005), to consequential public events (Cohn, Mehl, & Pennebaker, 2004; Coman, Manier, & Hirst, 2009). Unsurprisingly, recounting one’s memory to another person affects the memories of the speaker by reinforcing them (Karpicke & Roediger, 2008). But what is the effect of this recounting on the listener’s memories? Previous research has established that the listener often engages in predicting what the speaker will say based on the listener’s knowledge and the speaker’s previous utterances (Kuperberg & Jaeger, 2016; Pickering & Garrod, 2004). When these predictions are aligned with the speaker’s statements, they can strengthen the predicted (and confirmed) mnemonic representations; but when they are misaligned, the cognitive system will have to swiftly correct the misprediction.

And indeed, prediction errors trigger the suppression of the memories that generated them. To show this, Kim, Lewis-Peacock, Norman, and Turk-Browne (2014) asked participants to study A-B-C stimulus sequences. Following the presentation of these A-B-C sequences, participants saw modified sequences that contained only the first two initially studied items along with a new item (i.e., A-B-D). Critically, the repetition of the A-B pair between the study and the re-presentation phase was aimed at recreating the context in which the participants encountered the A-B-C sequence in the study phase, therefore facilitating the prediction of C. A final test phase evaluating memory accuracy for the initially presented information revealed that the C item was remembered less well than control items (i.e., items presented in the study phase that were not preceded by specific stimulus sequences).

Importantly, Kim et al. (2014) also investigated whether the effect they found was due to suppression triggered by prediction error, as they speculated, or to interference from competing items (Verde, 2004). To do so, they used multivoxel pattern analysis (MVPA). Participants’ brains were scanned both when they were presented with A-B-C sequences in the encoding phase and also during the subsequent presentation of the A-B-D sequence. From the encoding phase, the researchers captured the neural signature of the C item. They found that neural evidence of the C item during the presentation of the A-B-D sequence results in the suppression of the C item, as assessed in a subsequent recognition test. This pattern is consistent with a prediction-induced weakening of memory, rather than with a simple retroactive interference account. Importantly, not all mispredicted items were suppressed following the prediction phase. The hallmark sign of suppression triggered by context-based prediction error is an association strength effect by which weak A-B-C associations and strong A-B-C associations were found to be relatively immune to the suppression effects of misprediction, with only moderate A-B-C associations exhibiting suppression. This is because C is unlikely to be predicted during the presentation of the A-B-D sequence if it is only weakly associated with A-B, and is very likely to be predicted and, thus, highly activated, if it is very strongly associated with A-B. In both cases, the memory of C should be unaffected by a prediction error phase. Only a moderate association strength was found to lead to the suppression of C (Kim et al., 2014). A forgetting pattern consistent with the nonmonotonic plasticity hypothesis proposed by Newman and Norman (2010).

Given that communication is rife with prediction, and inevitably, prediction errors (Pickering & Garrod, 2004), conversations might create the precise circumstances for prediction errors to trigger suppression in listeners’ memories. We reasoned that if listeners use their own memories to make predictions about what the speaker will recount, then instances in which these predictions are invalidated by the speaker’s recounting should result in the suppression of the memories that generate these predictions. To test this hypothesis, we employed the paradigm used by Kim et al. (2014), but we added a social dimension to it. Participants first encoded 15 A-B-C stimulus sequences. They then listened to an audio of a speaker recalling five of these sequences as complete repetitions (e.g., A-B-C) and five as partial repetitions (e.g., A-B-D). The remaining five initially encoded A-B-C sequences were not presented during the listening task and constituted baseline items. If memory for the C items was better in the complete-repetition sequences than in the no-repetition sequences, then that would indicate a socially triggered repetition effect (ST-RE). If memory for the C items was worse for the partial-repetition sequences than for the no-repetition sequences, then that would be evidence of socially triggered context-based prediction error effect (ST-CBPE).

The main goal of the study was to establish whether CBPE can be socially modulated. That is, are there social situations in which individuals are more likely to engage in predictions based on their own experiences? We reasoned that increasing the demands to take the perspective of the speaker should result in higher likelihood of using one’s experiences for prediction (Epley, 2014; Batson, Early, & Salvarani, 1997). We therefore manipulated the degree to which individuals engaged in perspective taking during the listening task (low vs. high), and we measured whether this differential engagement impacted memory suppression following context-based prediction errors.

Method

Participants

One hundred Princeton students participated in the study to detect an effect size of .40 with a power of .80. One participant was excluded due to at-chance performance in the recognition task. Participants were randomly assigned to either the low-perspective-taking condition (n = 49) or the high-perspective-taking condition (n = 50). The average age of the sample was 20.61 years (SD = 4.57), and 62% of the participants were female.

Stimulus materials

In the incidental encoding phase, participants completed a two-wave sequential decision task. For each trial, in Wave 1, participants were presented with a name and image of a European capital city (e.g., London), and were asked to decide between two locations (e.g., bar vs. restaurant), indicating their preference for where to go if they found themselves in that particular capital city. Once they made their decision (e.g., bar), for Wave 2, they were presented with two options associated with this choice (e.g., brandy or whiskey) and had to choose between them (e.g., brandy). This procedure created 15 A-B-C decision sequences (e.g., London-bar-brandy). To generate these sequences, we assembled 15 city stimuli, 15 triads of location stimuli, and 30 triads of item stimuli. Each city, location, and item stimulus consisted of its name and a representative image. We programed an algorithm to randomly select a capital city, then pseudorandomly select one pair of locations (two locations randomly chosen from three possible locations), and then, depending on the selected location, to present a pair of items (two items randomly chosen from three possible items) contextually relevant for the selected location.

In the listening phase, participants listened to a gender-matched audio recording of 10 sequences. To assemble the audio materials for this phase, we recorded two confederates (one woman, one man) following a preestablished template. The confederates read sentences describing activities they purportedly engaged in when they visited the various cities (e.g., “When I visited London, I went to a bar and drank a glass of whiskey”). The images associated with each city, location, and item were also shown on the computer screen simultaneously with their audio presentation. Importantly, the listening phase involved the presentation of five sequences that constituted complete repetitions relative to the participants’ selections (e.g., London-bar-brandy) and five sequences that constituted partial repetitions (e.g., London-bar-wine). For partial repetitions, the final item in the sequence (e.g., wine) was the item in the triad that was not presented during the initial encoding phase. Five initially constructed sequences were not presented in the listening phase and constituted the no-repetition trials.

Design and procedure

Participants went through five phases: (1) incidental encoding, (2) perspective-taking manipulation, (3) listening phase, (4) test phase, and (5) speaker’s statement test (see Fig. 1). For distracter tasks, unrelated questionnaires (e.g., Need for Cognition) were inserted between any two phases. During incidental encoding, participants completed 15 two-wave forced-choice trials, as described above. Participants took as long as they needed to make each choice (Wave 1: M = 4,480 ms; SD = 3,138 ms; Wave 2: M = 3,195 ms; SD = 2,054 ms), with an intersequence interval of 1s. For the perspective-taking manipulation phase, they were told that they would listen to an audio of another person retrieving some of the studied sequences. They were then randomly assigned to either the low-perspective-taking (Low-PT) condition or the high-perspective-taking (High-PT) condition. For the Low-PT condition, they were instructed to “take an objective perspective toward what the speaker in the audio describes” and to “remain objective” throughout the listening task. For the High-PT condition, they were asked to “try to visualize the person in the particular situation” and to “imagine how this person felt as the experience unfolded.” Next, in the listening phase, participants listened to a speaker describing 10 sequence actions (five complete-repetition trials and five partial-repetition trials; see Fig. 1) and were told to monitor the speaker’s utterances. The order of the sequences was random, and the intersequence interval was set at 1 s, as in the encoding phase.

Fig. 1
figure 1

Experimental procedure. Participants first go through the incidental encoding phase in which they are presented with a capital city (i.e., London), they are asked to select between two locations (restaurant vs. bar), and then are asked to select between two items (whiskey vs. brandy) associated with the selected location (bar). Red contours = the chosen sequence. Participants are then randomly assigned to one of two perspective-taking conditions. They then go through the listening phase, which involves listening to five complete-repetition trials and five partial-repetition trials. A final recognition test phase follows. Not shown here due to space limitations, the speaker statement recognition test (but see Fig. 2). (Color figure online)

For the test phase, participants were presented with single images and had to indicate whether they had selected that item in the encoding phase. They gave their answer on a 4-point scale, as in Kim et al. (2014) (1 = definitely did not select this to 4 = definitely selected this). In the test phase, participants were presented with 60 items in random order: the 15 items they chose during the encoding phase, the 15 items that were paired with the chosen ones during the encoding phase but were not chosen themselves, and 30 new items.

Finally, for a speaker’s statement recognition test, participants were presented with the 10 correct (single) items the speaker chose and an additional 20 new items and were asked to indicate, for each item, whether the speaker had mentioned the item in the listening phase. For each item, they answered on a 4-point scale (1 =definitely did not mention this to 4 = definitely mentioned this; see Fig. 2).

Fig. 2
figure 2

Illustration of a trial correctly recognized with respect to speaker’s responses (top) and a trial incorrectly recognized with respect to speaker’s responses (bottom). In final test phase, in which the participant was asked to indicate what the speaker had mentioned in the listening phase, the participant correctly identified that the speaker mentioned “wine” (top), but incorrectly identified that the speaker did not mention “apple” (bottom). An interference account of the results predicts that participants should remember items in the chosen sequence (in the test-self phase) to a lesser extent if the participant correctly recognized items mentioned by the speaker than if the participant incorrectly recognized the items mentioned by the speaker (in the test-speaker phase). (Color figure online)

Coding

Initially selected items for which the participant indicated that they definitely selected this item in the test phase were coded as correct. All the other options for these initially selected items were coded as incorrect. Similarly, we coded correct rejections as the items that the participants did not choose initially (competitors) or were new to participants (lures) and for which they indicated that they definitely did not select this. Of note, requiring high confidence for an item to be considered remembered is recommended by prior studies (Kim et al., 2014; Wagner et al., 1998). This is because including less confident answers in recognition tests results in ceiling effects and thus reduces the sensitivity of the measurement. Including less confident answers as correct (i.e., probably selected this) produces a similar pattern of results, but, because of the reduced sensitivity of the measurement, the comparisons do not reach conventional statistical significance levels.

We computed D′ scores by using the hit rates and correct rejection rates for the competitors. This decision was made because the lures exhibited a ceiling effect (i.e., correct rejection for 98% of lures).

Results

We performed separate analyses for the repetition effect (RE) and the context-based prediction effect (CBPE). For the RE, we conducted a repeated-measures ANOVA, with Item Type as a within-subjects factor (Complete repetition vs. No repetition) and Condition (High PT vs. Low PT) as a between-subjects factor. D′ constituted our dependent variable. We found no main effect of Item Type, F(1, 97) = .10, p = .75, ηp2 = .001; Condition, F(1, 97) = .149, p = .70, ηp2 = .002; or their interaction, F(1, 97) = 2.11, p = .149, ηp2 = .021. One possible explanation for the failure to find a repetition effect could be a ceiling effect for the hits. The average recognition rate for both the complete-repetition trials and the no-repetition trials was M = .95 (SD = .11) and M = .93 (SD = .12), respectively.

To examine our main interest—the CBPE effect on memory—we conducted a repeated-measures ANOVA, with Item Type as a within-subjects factor (Partial repetition vs. No repetition) and Condition (High PT vs. Low PT) as a between-subjects factor. As in the previous analysis, D′ was a dependent variable. We found a main effect of Item Type, F(1, 97) = 11.01, p < .001, ηp2 = .102, but not for condition, F(1, 97) = .002, p = .96, ηp2 = .00. We found a marginally significant effect for the interaction between Item Type and Condition, F(1, 97) = 3.82, p = .053, ηp2 = .038. Given that we had specific hypotheses for the difference between conditions, we conducted post hoc analyses to compare the D′ for the partial-repetition and the no-repetition trials, separately for each of the two perspective-taking conditions. Of note, conducting post hoc analyses to investigate hypothesized effects is considered appropriate even in the absence of a significant interaction (Greenland, 1983; but see Nieuwenhuis, Forstmann, & Wagenmakers, 2011, for a counterargument to this approach). Paired-samples t tests showed that there was a CBPE effect in the High-PT condition, t(49) = 4.41, p < .001, d = .62, CI[.18, .47] (Partial-repetition trials, M = 1.97, SD = .59; No-repetition trials, M = 2.30, SD = .49). No CBPE effect was found for the Low-PT condition t(48) = .84, p = .40, d = .12, CI[−.12, .28] (Partial-repetition trials, M = 2.10, SD = .44; No-repetition trials, M = 2.18, SD = .52; see Fig. 3).

Fig. 3
figure 3

Mean D′ scores by Item Type and Condition. On the y-axis, D′ scores, for the different item types: Complete-repetition trials (black), Partial-repetition trials (white), No-repetition trials (gray). The larger the D′, the better the memory. Note that the y-axis truncates the scale to begin at 1.5 to better showcase the differences between conditions. See Supplementary Fig. 1 for the hits and correct rejections used to compute these D′ scores. Error barsrepresent ±1 standard error around the mean

An alternative explanation could involve a difference in cognitive effort deployed during the listening task between the two conditions. More specifically, participants in the High-PT condition might have engaged in more cognitive effort (to monitor the speaker) than those in the Low-PT condition. Importantly, a difference in cognitive effort between the two conditions would only affect the complete-repetition and the partial-repetition trials. According to this explanation, memory scores should be lower in the High PT than in the Low-PT condition. An ANOVA with Item Type (Complete repetition vs. Partial repetition) as a within-subjects variable and Condition (High PT vs. Low PT) as a between-subjects factor, did not reveal a significant effect for Condition, F(1, 97) = 1.18, p = .281, ηp2 = .012. In addition, a cognitive effort explanation would also predict that memory scores for statements mentioned by the speaker should be different in the High-PT condition than in the Low-PT condition: lower if the listening task is more difficult to perform, higher if the listening task results in the increased deployment of attention. In fact, the proportion of errors in the two conditions was equivalent, t(97) = .316, p = .75, d = .06 (MHigh-PT = .24; MLow-PT = .23).

Could the pattern we obtained be explained by interference? That is, the item provided by the speaker in the listening task (e.g., wine) might retroactively interfere with the participant’s initially encoded item (e.g., brandy) if there is an inconsistency between these items (as is the case for the Partial-repetition trials). To check for this possibility, using the participants’ recognition in the final speaker statement test, we investigated whether the participant’s hit rates for their own choices were significantly different between items that were correctly recognized as mentioned by the speaker and those incorrectly recognized as mentioned by the speaker (see Fig. 2 for an illustration of the analysis). An interference account would predict lower hit rates for trials for which participants correctly recognized the speaker’s responses than for trials for which they incorrectly recognized the speaker’s responses. We thus conducted paired-samples t tests to compare between the participant’s hit rates (for their own memories) for items correctly recognized as mentioned by the speaker with the hit rates (for their own memories) for items incorrectly recognized as mentioned by the speaker. Neither for the High PT condition, t(97) = .087, p = .72, d = .01, nor for the Low PT condition, t(97) = −.089, p = .87, d = .01, was there a significant difference between the correctly recognized and the incorrectly recognized items. Similar results were obtained with the correct rejection rates, High PT: t(97) = −.478, p = .47, d = .07; Low PT: t(97) = .413, p = .54, d = .05 (see Supplementary Fig. 2).

Our previous analysis, even though suggestive, is not conclusive in arguing against an interference account. A more direct demonstration that the pattern we obtained is indeed due to prediction errors would be an association strength effect. Derived from previous work on the nonmonotonic plasticity hypothesis (Newman & Norman, 2010), this association strength effect predicts reduced recognition rates following prediction error for sequences for which the items were moderately associated with one another, and not for loosely associated or strongly associated sequences (i.e., U-shaped pattern). This is because only the moderately associated items should be predicted by participants to the extent that might trigger suppression during the listening phase. An interference account would predict a linear relation between recognition rates and the level of item associations in the sequence: the stronger the item association, the better the recognition rate. Loosely associated items are likely to be displaced by new items presented in the listening phase, while strongly associated items should be insensitive to such interference. To explore whether our data exhibits an association strength effect, we asked an independent group of 172 Mechanical Turk participants (Mage = 36.13 years, SD = 10.82, 52% female) to evaluate the associative strength of all the sequences that the participants in the lab study were exposed to (i.e., all A-B-C sequences). Each such sequence was evaluated by a subset of approximately 60 MTurk participants on how strongly associated the items in the sequence are on a scale from 1 (not at all) to 7 (very much so). Based on the average associative strength rating of the Mechanical Turk participants, for each participant in the lab study we ranked the 5 A-B-C sequences that were part of the Partial-repetition condition from the lowest associated sequence to the highest associated sequence. Separately, we employed a similar ranking for the five items that were part of the No-repetition condition for comparison. We next computed the hit rates of the C item depending on the level of associative strength of the sequence.

A polynomial contrast performed with a repeated-measures ANOVA, with Association Strength as an independent variable and recognition rates as a dependent variable, revealed that the effect was quadratic in the Partial-repetition condition, F(1, 98) = 3.83, p = .053, ηp2 = .038, and linear in the No-repetition condition, F(1, 98) = 4.42, p = .038, ηp2 = .043 (see Fig. 4). This pattern offers support for a mechanism involving suppression triggered by prediction error and, we contend, is incompatible with an interference account.

Fig. 4
figure 4

Recognition scores (for hits) as a function of associative strength of the sequence that contained the C item. The partial-repetition items exhibit an association strength effect consistent with suppression triggered by prediction error (i.e., quadratic). The no-repetition items exhibit an expected linear increase as a function of association strength. Error bars represent ±1 standard error around the mean

General discussion

In the current study, we showed that listening to another person describing their experiences results in the suppression of one’s own memories. This was demonstrably due to the mnemonic pruning that accompanies prediction errors generated based on one’s own memories during listening. The fact that the additional analyses were not consistent with the most plausible alternative account (i.e., interference) gives us confidence that we are indeed capturing a mnemonic suppression mechanism. Moreover, a retroactive interference account cannot explain the difference between the high and low perspective-taking conditions, a pattern easily accommodated by a context-based prediction error account.

It is important to note, however, that we are not claiming that the process by which one’s memories are suppressed is fundamentally social. Individuals are might simply be more likely to use their own experiences to make predictions in certain situations. The particular (social) situation that we investigated was postulated to increase the rate of prediction. Indeed, there are nonsocial factors that could affect the degree to which one engages in self-referential prediction during the listening task (e.g., the success rate of previous attempts; the similarity of the experiences between speaker and listener). Our study simply shows that CBPE affects people’s memories during listening and that the effect is socially modulated.

This social modulation opens intriguing research opportunities. For instance, what are the minimal conditions that trigger ST-CBPE? When one is motivated to differentiate oneself from the interacting partner, one consequence of such differentiation could be the blocking of prediction attempts, which, in turn, would insulate one’s memories from distortion (Fiske, 2004). Paradoxically, situations in which the listener has a lot of information about the speaker would similarly protect one’s memories, since the model used for prediction might not be self-referential, but other-referential. In other words, one would expect ST-CBPE in situations in which individuals are both motivated to relate to a conversational partner and for which they do not already have a model that could be used as a basis for prediction.

Another research trajectory that is spurred by the current investigation is in exploring the neural dynamics involved in ST-CBPE. This would not only unambiguously clarify the mechanism responsible for ST-CBPE, but would also allow for capturing the neural instantiation of a dynamical cognitive process. Recent methodological developments in neuroscience (i.e., multivoxel pattern analysis; Lewis-Peacock & Norman, 2014) and theoretical advances in social neuroscience (Tamir & Thornton, 2018) would provide the necessary scaffold one would need to investigate the neural processes involved in ST-CBPE. These investigations, at both the behavioral and the neural levels of analysis, could then inform research into psychological disorders where prediction errors could have meaningful consequences, such as autism (Baron-Cohen, Tager-Flusberg, & Lombardo, 2013) and anxiety disorders (White et al., 2017).

Finally, the present research uncovers an unrecognized influence of speakers during communication that could have substantial practical implications (Schacter, 2001). Jury deliberations could lead to the suppression of previously encoded memories if they involve, as they often do, predictions of the speaker’s utterances. Group therapy that involves recounting of thematically similar events could shape the participants’ memories in unintended ways. And newscasters could influence the memories of their audiences, especially when reinterpreting already encoded events.