How do people react when they receive disconfirmatory social feedback about events that they remember vividly and accurately? Do they discount the feedback and defend their memory, or alternatively decide to reduce belief in their memory? Functional approaches to remembering propose that autobiographical memory serves to support a variety of social and communicative functions. Bartlett (1932) is credited with making some of the earliest claims that the ways in which events are remembered, as well as the content of what is remembered, are socially determined. Prominent theories of episodic memory (Tulving, 1983) and the metacognitive regulation of memory (Koriat & Goldsmith, 1996) emphasize the central role of retrieval contexts in shaping memory search and subsequent reporting. In other words, what a person remembers (internally) and what a person chooses to report are partly determined by the reason(s) that a particular act of remembering happens. Given that remembering often occurs in social contexts and for social purposes, the retrieval context is frequently affected by social factors (Blank, 2009; Bluck, Alea, Habermas, & Rubin, 2005; Foley, 2015; Hirst & Echterhoff, 2012; Hyman & Faries, 1992; Mahr & Csibra, 2017).

Social communication about remembered events sometimes involves dialogue about the veridicality of recall. For example, Hirst and Echterhoff (2012) argued that people routinely discuss past experiences and review some of the benefits (e.g., collaborative facilitation, transactive memory) and costs (e.g., collaborative inhibition, audience tuning) for subsequent memory that are associated with conversations about past events. When receiving disconfirmatory social feedback about memories, people sometimes defend and sometimes reduce their belief that the recalled event occurred. Research on contested memories has shown that in response to contradictory claims made by others, individuals may maintain their claims that the events actually happened to them (Sheen, Kemp, & Rubin, 2001, 2006). Conversely, data on naturally occurring nonbelieved memories (NBMs) show that individuals sometimes reduce their belief in the occurrence of vividly remembered events in response to disconfirmatory social feedback (Mazzoni, Scoboria, & Harvey, 2010; Otgaar, Scoboria, & Mazzoni, 2014; Scoboria, Boucher, & Mazzoni, 2015). In these studies, participants were asked to report personal memories of events they no longer believed happened to them and to specify the reason(s) for belief withdrawal. Disconfirmatory social feedback was the most frequently reported reason across a number of studies. Subsequent work on spontaneous NBMs has shown that the degree to which belief in occurrence is affected when memories for events are challenged can vary substantially (Scoboria, Nash, & Mazzoni, 2017).

The studies reviewed to this point have been based on reports about existing, naturally occurring, personal past memories (whether or not said memories were objectively accurate). To document the extent to which belief in occurrence for correctly remembered events is affected by social input, greater control over the encoding and retrieval conditions is required in order to understand the frequency and intensity with which social feedback results in changes to belief, and to identify factors that influence belief change. Greater experimental control can be gained by using laboratory analogues of personal memories, as in the published studies described next and the two new studies reported here.

Several studies have examined memories (and more specifically false memories) for actions in lab settings, and some have collected remember–know ratings (e.g., Thomas, Bulevich, & Loftus, 2003; Thomas & Loftus, 2002) that potentially overlap with recollection/belief-in-occurrence judgments (for a discussion of the theoretical differences between these approaches, see Mazzoni & Scoboria, 2007; Scoboria et al., 2014). Other studies have found that observing other people perform actions can lead to false memories of having self-performed the actions (Linder, Echterhoff, Davidson, & Brand, 2010). The relationships between remember/know judgments and autobiographical belief and recollection have yet to be examined in depth in the literature. Scoboria and Talarico (2013) proposed that conceptualizations of “remember” and “know” include different processes that potentially contribute to autobiographical belief and/or recollection judgments. The study by Clark, Nash, Fincham, and Mazzoni (2012) was the first to examine the effects of social feedback about false memories for actions performed in the lab. Participants were videotaped while imitating actions performed by an experimenter. Later, participants viewed a video that had been doctored to show the experimenter performing actions that the participant had not in fact imitated. This procedure resulted in high levels of false belief and false recollection for these suggested actions. When belief and recollection ratings were taken once again following debriefing about the deception, belief ratings for the suggested actions decreased to a greater extent than recollection ratings, indicating that social feedback has a greater impact on strength of belief in occurrence than on strength of recollection. Mazzoni, Clark, and Nash (2014) replicated these findings when providing feedback about actions that had genuinely been performed. Other studies have found that self-report items that ask participants to rate the degree to which an event is remembered versus known cluster with items that tap recollection (Scoboria & Pascal, 2016; Scoboria & Talarico, 2013).

Using a false-memory implantation methodology, Otgaar, Scoboria, and Smeets (2013) found that of 32 individuals who developed a memory for a suggested false event (hot air balloon ride as a child) in the lab, 19 retracted all memory claims when informed about the deceptive procedure, 12 individuals reported a loss of belief that the event occurred with sustained recollection, and one continued to claim to continue to have a memory for the suggested event.

Whereas these lab-based studies (Clark et al., 2012; Mazzoni et al., 2014; Otgaar et al., 2013) focused primarily on whether nonbelieved memories (NBMs) developed in response to social feedback about believed memories (for both objectively true and false memories), in the present studies we explored the effects of social feedback on both belief reduction (cases in which belief for a memory is reduced) and memory defense (cases in which a memory remains believed).

A methodology that is well-suited to examine this purpose is the imagination inflation methodology for actions. Originally developed by Goff and Roediger (1998), participants hear simple action statements (e.g., break the toothpick), and for some actions, either imagine or perform them. In a second session (e.g., one week later), participants imagine several actions some of which were presented during the first session. Then, during a final phase, participants are asked to recognize whether action statements had been presented during the first session and, if so, to state whether they had heard, imagined, or performed each. The canonical finding is that imagining actions during the second session leads participants to make higher likelihood ratings than during the first session. This effect has been called imagination inflation. This paradigm has since been used in many variants using for example bizarre actions (e.g., sit on the dice; Thomas & Loftus, 2002) or by letting participants observe other people performing actions (Linder et al., 2010). In general, these studies show that the imagination or observation of actions can result in false memories of having performed those actions.

We extended a study by Otgaar, Scoboria, Howe, Moldoveanu, and Smeets (2016), who adapted the Goff and Roediger (1998) imagination inflation procedure to incorporate challenges regarding recall. In their two studies, adult (N = 30) and child (N = 31) participants completed three sessions. First, they performed, imagined, or heard a series of 72 simple actions (e.g., bounce the ball). One week later, they repeatedly imagined a subset of the actions. After another week, they completed a recognition and source monitoring test. During this test participants were told that some actions correctly recalled as performed (true memories) and some actions incorrectly recalled as performed (false memories) had not been originally performed. Belief in occurrence and recollection (measured as dichotomous yes/no responses) were recorded after each challenge. In both studies, memory defense and memory relinquishment were common, and some participants showed a greater tendency to defend, and others to relinquish, a belief. Furthermore, challenges to true memories were less likely to lead to relinquishment of belief than challenges to false memories. They also found that some participants always defended, some always relinquished, and some showed a mixture of responses, thus providing preliminary evidence that there may be interesting individual differences in the propensity to reject or accept social feedback about the veridicality of memories.

Otgaar et al. (2016) showed that challenges can lead to both defense and relinquishment of belief. However, their design presents a number of limitations. Most notable, the number of challenges to events provided per participant varied (e.g., from two to nine in Study 1), which led to differences in the individual contribution made by each participant to the findings. This limits the ability to confidently estimate the degree to which challenges influenced other variables, and also to ascertain the existence of individual differences in the propensity to defend or relinquish belief. Second, the yes/no response emphasized a dichotomy of defense versus relinquishment of belief, disregarding the possibility to assess the gradient in change (from slight reduction to complete relinquishment of belief) that has been observed in previous studies (e.g., Scoboria et al., 2017). Belief in occurrence is frequently assessed using a continuous scale (Scoboria et al., 2014), which provides a richer pattern of responses. Third, belief and recollection ratings were taken only for items that were rated as “performed” during the final recognition test, precluding comparison of ratings of actions that were and were not challenged.

In the two exploratory studies reported here, we built on Otgaar et al.’s (2016) work with the goal of advancing understanding of how people respond to socially presented challenges for correct memories for performed actions in an experimental analogue. The key changes that we made to the design of the studies were the following: (1) providing an equal number of challenges to participants so that each participant would contribute equally to the results, and rates of defense and reduction could be compared across participants; (2) measuring belief in occurrence and recollection using continuous rating scales, which permits estimation of the degree to which belief and recollection are affected by challenges; (3) taking ratings for all items presented during the recognition test to have control items to assess spontaneous changes to belief that are not due to the feedback. In addition, we only looked at the effect of disconfirming feedback on correct memories for performed actions because we wanted to assess the fate of strongly recollected items.

We expected to see variability in postchallenge belief ratings, and more so for challenged than for nonchallenged performed actions. We predicted that some of these strongly recollected memories would be defended, but also expected to observe cases in which belief was lower after challenges. Given that we provided the same number of challenges per participant, we anticipated that we would be better able to estimate the frequency with which participants accept or reject challenges, and also examine the individual propensity to defend versus reduce belief. Additionally, the use of a continuous scale in conjunction to the initial high belief ratings typically assigned to correctly recollected performed actions makes it possible to explore the gradient in belief change as a result of the social challenge.

Study 1

Method

Participants

Forty-five undergraduate students completed the study in exchange for course credit. We recruited slightly over 40 participants, because the preceding study in which a similar method was used (Otgaar et al., 2016, Study 1) demonstrated effects of challenges on memory reports with 39 adult participants. Demographic characteristics were: Mage = 23.38 years, SD = 6.66, range 18 to 47, 72% female.

Materials

Actions

One hundred thirteen action statements from Goff and Roediger (1998) were used for the pool of actions. Forty-five actions involved an external object presented to the participant (e.g., “look through the magnifying glass”) and 45 actions did not require an object (e.g., “touch your knee”). The final recognition test included an additional 23 filler actions, for a total of 113 action statements.

Ratings

The actions were rated using five items, taken from previous work on memory appraisals (Scoboria et al., 2014). These included belief in occurrence (How likely is it that you did in fact perform this action? 1 Definitely did not perform; 7 Definitely performed); recollection (Do you actually remember performing the action? 1 No memory at all; 7 Clear and complete memory); visual imagery (When I think about performing this action it involves visual details. 1 Not at all, 7 Very Much); vividness (The clarity of my performing this action in my mind is: 1 Not at all clear, 7 Extremely clear); and reexperiencing (While thinking about this action, I feel as though I am reliving performing it. 1 Not at all, 3 Vaguely, 5 Distinctly, 7 As clearly as if it were happening now).

Procedure

Participants were tested individually in two sessions that were conducted one week apart: Session 1 included encoding and imagination (1.5 h), and Session 2 included the recognition test and memory challenges (1.5 h) (see Fig. 1). Note that in preceding studies the encoding and imagination phases were separated by one week. Because we were not interested in the effects of imagination in the present study, we opted to hold the two phases on the same day, retained the repeated imagination phase to introduce source confusion, and took steps to emphasize the different phases of the study to the participants (see below). This design choice had the additional benefit of allowing the study to be conducted in two instead of three sessions, facilitating participant retention.

Fig. 1
figure 1

Procedures for Studies 1 and 2

In Session 1, participants were told that they would hear a series of action statements. For some, they would perform the action, for some they would imagine the action, and for some they would hear the action statement and solve basic math problems (to interrupt rehearsal). In total, 72 actions were presented (24 performed, 24 imagined, 24 heard/math). Each action was read aloud, and participants continuously performed the action, imagined the action, or solved math problems for 15 s. For imagined actions, participants were told “After hearing each action you should imagine yourself performing it. It is possible that you will hear some action statements more than once. You should imagine performing the action each time you hear it.” For the 45 actions involving an object, the objects were arranged out of sight on a stand next to the experimenter and were presented individually when participants had to perform (using the object) or imagine (with the object visible) the action.

After the encoding phase, participants were told that they would take a 15-min break, following which they would complete the next phase of the study. Participant were encouraged to take a walk, and snacks were provided to emphasize that the study was shifting to a different phase. Following the break, all participants engaged in repeated imagination. Thirty-six actions, 18 from Session 1 (six performed, six imagined, six heard) and 18 new actions, were presented three times each in a fixed, random order, for a total of 108 imaginings. Actions were imagined for 12 s apiece.

Session 2 (recognition test and memory challenges) took place one week later. One hundred and thirteen actions (72 original actions, 18 actions from the repeated imagining part, 23 new actions) were presented in a predetermined random order. Participants were instructed to respond only on the basis of the presentation of actions in the first part of Session 1 when actions were performed, imagined, or heard. Each action statement was read aloud by the experimenter, who recorded all responses. First, the experimenter asked whether the action had been originally presented (e.g., “Was the action ‘open the book’ presented?”). If the response was “yes,” the experimenter asked whether the action was originally performed, imagined, or heard. Selected actions were challenged at this point (see the next paragraph). The experimenter then asked the participant to verbally rate these actions using the five items (belief in occurrence, recollection, vividness, visual, and reexperiencing), which were presented on a printed sheet.

During the test, memory for a number of correctly recollected performed actions was challenged. For every third item that was actually performed and for which the participant said that it was performed, the participant was told “You said ‘performed’; this action was imagined in the first session.” The feedback schedule was organized with the goal of providing at least four challenges per participant, on the basis of the assumption that participants would correctly recall at least 12 of the 24 originally performed actions; this seemed reasonable given that the average correct recognition rate for performed actions was 79% in Otgaar et al. (2016, Study 1).

Results

Was initial recognition of performed actions sufficient to facilitate the challenges?

Initial performance on the recognition test is provided in Table 1; these data provide the context within which the primary analyses below are to be understood. On average, participants correctly recalled 76% of performed actions, indicating that performance was above the presumed threshold of 50% that was required for presenting the challenges.

Table 1 Study 1: Average initial recognition test performance

Were four challenges presented per participant as planned?

Of the 45 participants, 42 received at least four challenges, and three received three challenges. Although some participants received more than four challenges, these items were not analyzed due to our prespecified interest in examining the same number of items per participant. The data for the first four challenged actions were analyzed. Including or excluding the three participants who received three challenges had no impact on the findings, and the cases were retained.

Did challenges affect ratings on average?

Average belief in occurrence, recollection, visual detail, vividness, and reexperiencing ratings for the challenged and nonchallenged actions are provided in Table 2.

Table 2 Study 1: Mean item ratings for correctly recollected performed actions by challenged and nonchallenged status

To avoid a number of the limitations associated with null hypothesis significance testing, we examined group differences per Cumming (2014), by calculating mean differences and standardized effect sizes along with their associated 95% confidence intervals. In this approach, statistical significance is indicated by lack of overlap in the confidence intervals, for differences between the group means, and lack of overlap of the confidence intervals with zero, for standardized effects and mean differences. If the confidence intervals of the means do not overlap, we can be confident that the means are statistically different. Challenged actions received statistically lower ratings for all five items. This shows that all variables were lower on average following challenges, with size of the effects ranging from moderate to strong. On the basis of comparing the confidence intervals on the standardized effects, the challenges had a numerically stronger effect on belief in occurrence and recollection ratings (which did not differ) than on visual, vividness, or reliving ratings. These findings show that ratings declined on average following challenges. However, we also expected, on the basis of Otgaar et al.’s (2016) results, some participants to defend the memory for at least some of the challenged actions. We next examined the propensity to defend versus reduce belief in initially recollected items.

Defending versus relinquishing belief in memories

The preceding analysis shows that challenging correctly recollected performed actions resulted in lower belief ratings than were found for nonchallenged correctly recollected performed actions. We next considered the frequency with which each participant defended versus reduced belief for memories of challenged actions. This also permitted recalculation of the impact of challenges on the other variables on the basis of whether individual memories were defended or relinquished.

In Otgaar et al. (2016), participants made dichotomous reports as to whether they believed the action had originally been presented. Here we measured belief in occurrence in a more typical manner, using a continuous scale. This required that we define what level of postmanipulation belief rating indicated belief reduction. To do this, we looked at the distributions of belief ratings for challenged and nonchallenged items (see Fig. 2). Here, as in the previous sections, we interpreted confidence intervals. Examination of the distributions indicated that challenged items were less likely than nonchallenged items to be rated at the scale ceiling, propdiff = 19.2 [10.8, 27.3], and were more likely to be rated on the lower four points of the scale, propdiff = 16.6 [10.5, 23.5]. We considered ratings of five or six on the scale to be ambiguous due to overlap between the distributions.Footnote 1 On the basis of these data, we defined reduction as cases in which belief was four or lower, and defense in cases for which belief was rated at the scale ceiling. By this definition, 44.1% (N = 78) of challenges resulted in defense, and 23.2% (N = 41) resulted in reduction.

Fig. 2
figure 2

Proportions of ratings at each level of the belief in occurrence scale, for challenged (n = 177) and nonchallenged (n = 575) performed statements. Error bars show 95% confidence intervals for the proportions

We calculated the average ratings for the five variables based on defended versus reduced status (Table 3). The ratings for defended actions were high across all rated variables, and the ratings of all five variables for all defended actions exceeded the ratings made to control actions (see Table 2 for control ratings for nonchallenged actions). The ratings for reduced belief actions were consistently lower than those for defended belief actions, with ratings of the former falling below the midpoint of the belief scale. The ratings did not differ across items within reduced actions.

Table 3 Study 1: Mean item ratings for reduced and defended actions

Do participants show tendencies to always defend or always relinquish?

Using the definitions for defense and relinquishment above, 19 participants (42.2%) always defended, 12 (26.7%) always reduced belief, and 12 (26.7%) showed a mixture of defense and reduction (see Table 4, column a). Two participants did not contribute to this analysis due to making postmanipulation belief ratings in the ambiguous range.

Table 4 Proportions of challenged events relinquished by participants

To explore the impact of the ambiguous challenged actions on the results, we recalculated these rates, once assuming that the 48 ambiguous actions indicated defense (adding defense) (Table 4, column b) and another time assuming that ambiguous that the same actions indicated reduction (adding reduction) (column c). The results across the two calculation methods indicate that somewhere between 15% (adding reduction) and 46% (adding defense) of participants defended belief for all events, and that between 2% (adding defense) and 31% (adding reduction) reduced belief for all events.

Pattern of responding to challenges

For 53.3% of the participants, belief was defended for the first challenge; for 20.0%, belief was reduced for the first challenge; and for 26.7%, the belief rating for the first challenged action fell into the ambiguous category. Relative to all other participants, participants who defended belief for the first challenge on average defended more of the subsequent challenged actions (Mprop = .56 [.43, .71] vs. .24 [.08, .41]) and reduced for fewer of them (Mprop = .16 [.06, .27] vs. .34 [.22, .48]). Relative to all other participants, those who relinquished belief for the first challenge defended fewer subsequent challenges on average (Mprop = .18 [.00, .44] vs. .47 [.35, .60]) and reduced belief for more of the subsequent challenged actions (Mprop = .48 [.25, .70] vs. .19 [.00, .44]). The response to the first challenge predicted the direction of subsequent responses to challenges. In other words, reducing belief for the first challenge predicted both more belief reduction (rho = .37 [.06, .63]) and defending at fewer subsequent challenges (rho = – .30 [– .54, – .01]) than among all other participants. Conversely, defending belief for the first challenge predicted defending at more subsequent challenges (rho = .45 [.17, .70]) and reducing at fewer subsequent challenges (rho = – .33 [– .59, – .06]).

Did the challenge procedure result in nonbelieved memories?

Nonbelieved memories (NBMs) are typically defined as events for which recollection exceeds belief (Clark et al., 2012; Mazzoni et al., 2010; Scoboria et al., 2014). We looked to see whether the procedure resulted in such memories. Across all challenged performed actions, 14 recollection ratings from 12 participants were rated one or two points higher than belief. This represents a small proportion (7.9%) of the challenged actions, and too small a number for further analysis. Hence, we opted to not examine NBMs further in Study 1.

Study 2

Study 1 extended our understanding as to the rate and extent to which people defend or reduce belief in recalled actions following social challenges, by providing an equivalent number of challenges per participant, by taking continuous ratings for all items, and by making contrasts with nonchallenged control items. The number of challenges per participant was nearly uniform, and the number of challenges that could be included in analyses was almost equivalent across the sample. The findings confirmed that even for strongly believed and recollected memories for performed actions, in more than 25% of cases the challenges were associated with lower belief than actions that had been performed. As expected given the vivid nature of the memories and that challenges immediately followed initial recall of the actions, a substantial proportion of the challenges were resisted and the belief defended. More than 40% of participants defended against all challenges, with the balance reducing belief following all challenges or showing a mixture of defense and belief reduction.

We remind readers that in Study 1 the challenges immediately followed initial recall and preceded rating of the actions. This aspect of the procedure might have also influenced the frequency of defense. Challenges might not be so effective when memorial decisions are very recent and strong, thus resulting in many challenged memories being defended. The role of taking ratings prior to and after challenges was examined in Study 2, when the challenges were postponed and given following initial belief ratings.

One limitation of Study 1 was that the use of the continuous scale created an overlap in the distributions of belief ratings for the challenged and nonchallenged items, which made it difficult to confidently classify responses to all of the challenges. A more direct measure of the rate of defending versus reducing belief in response to social challenge can be obtained by measuring pre–post changes in belief. The lack of a pre–post measure of change was also a limitation of the Otgaar et al. (2016) study. In that study the initial level of belief before the challenge was not measured, and it was assumed that the actions were believed prior to the challenge. Although this seems reasonable, because the initial statement on the recognition test was “Yes the action was presented, and I performed it,” the lack of pre–post measure of belief makes conclusions about individual differences in the tendency to either defend or relinquish belief and the degree of belief change problematic. Thus, Study 2 replicated Study 1 using a pre–post design. This allowed us to examine both frequency and magnitude of change in ratings for challenged and control items. Given that making belief ratings prior to the challenge may serve to anchor the ratings, one could expect that memory defense might be more likely than in Study 1. Alternatively, relinquishment might be more likely, due to a delay between rating numerous actions during the test and receiving challenges to selected memories once the test was complete.

Method

Participants

Ninety-five university students received course credit for completing the study. This larger sample than we had used in Study 1 made it possible to further examine individual differences in responding. Ten participants were removed due to having fewer than three performed actions challenged (due to low performance on the initial recognition test). The demographics for the final sample of 85 were Mage = 21.0 years, SD = 4.4, range 18 to 50, 69% female.

Procedure

The procedure for Session 1 was identical to that in Study 1. In Session 2, participants completed the recognition test and rated each action using the same five scales as in Study 1; in this study, the challenges were presented after the recognition test was completed. The researcher left the room for 10 min to prepare the next phase of the study; participants took their break at this point. On return, the researcher re-presented 16 actions; eight of these had been correctly recalled as originally performed, and the other eight had been correctly recalled as originally imagined. Half (four) of these correct performed actions had been challenged and half (four) had not, and new ratings were taken (challenge or control status was randomly assigned). The same occurred for the eight imagined actions (half challenged, half not challenged). This produced four groups of target actions: four performed challenged, four performed control, four imagined challenged, four imagined control. As in Study 1, our focus remained on the performed challenged actions; the imagined actions were included to mask this interest, so that it would be less obvious that the challenges were directed at remembered performed actions. For challenged performed actions, the researcher told the participants that they had originally imagined the action. For challenged imagined actions, the researcher told the participants that they had originally heard the action. After presenting each action, the researcher asked the participant to rate the action again using the same five scales.

Results

Was initial recognition performance similar to that seen in Study 1?

Given that the two studies were identical up to the test, excepting that the challenges were presented during the test during Study 1 and after the initial ratings in Study 2, we anticipated that the initial recognition performance would be similar. As can be seen in Table 5, this was the case. The average correct recognition for the performed actions was 74%, which was statistically the same as the performance in Study 1 (76%).

Table 5 Study 2: Average initial recognition test performance

Did challenges affect item ratings on average?

Premanipulation, postmanipulation, and change scores by action type (performed, imagined) and challenge type (challenged, control) are provided in Table 6 and Fig. 3. We compared the premanipulation ratings between challenged and nonchallenged items across the variables and found no statistical differences for performed or imagined actions (as in Study 1, we examined confidence intervals).

Table 6 Study 2: Average Time 1, Time 2, and difference scores by challenge status
Fig. 3
figure 3

Changes in ratings for challenged and control items. Recc, Recollection; Reexp, Reexperiencing. Error bars show 95% confidence intervals for the means

As is evident in Fig. 3, challenging both performed and imagined items resulted in statistically meaningful reductions in scores across all items. The decrease in belief in occurrence and recollection ratings was larger for challenged performed than for challenged imagined actions (Mdiff = 2.37 [95% CI 2.05, 2.68]). This may be in part related to higher premanipulation ratings for performed than for imagined actions (see Table 6), which means that scores had the potential to decrease to a greater extent.

Challenging performed actions resulted in decreased scores for all variables as compared to the control items, and control items showed zero change, on average, for all variables. The effect of challenging performed actions was largest for belief ratings, with an average decrease of three points on the 7-point scale. Belief ratings decreased to a greater extent than did recollection ratings, Mdiff = .62 [.31, .95]. Scores on the three memory characteristic items (visual, vivid, reexperiencing) also decreased but to a statistically lesser extent (see Table 6).

We further examined whether defended and reduced challenged actions differed on any of the additional memory characteristic variables (visual, vivid, reexperiencing, and recollection), prior to the challenge (at the prechallenge rating). Challenged performed actions that were defended received higher reexperiencing ratings than challenged performed actions that were relinquished (Mean defended = 6.41; Mean reduced = 5.90; Mdiff = 0.502, [95% CIdiff = 0.20, 0.80]; dunb = 0.37 [0.22, 0.52]). For thoroughness of reporting, we also note that vividness ratings were in the expected direction but did not reach statistical significance; Mdiff = – 0.347 [– 0.73, 0.035], dunb = 0.22 [0.05, 0.38]. This may indicate that participants used strength of autonoetic awareness when deciding how to respond to challenges, and that on average they tended to accept challenges associated with lower feelings of reexperiencing and reject challenges with higher feelings of reexperiencing.

Challenging imagined actions also resulted in significantly decreased scores for all variables; the degree of change was similar across items (about half a point on the scale on average). Belief and recollection ratings decreased to a similar extent, Mdiff = .10 [– .02, .16]. Control imagined items showed zero change on average for all variables.

Do participants sometimes defend and sometimes relinquish belief in memories?

To facilitate comparison between the studies, we first defined defense and reduction in the same manner as in Study 1 (scores of 7 indicated defense, and scores of 4 or lower indicated reduction of belief; see Fig. 4 for the distribution of scores). Overall, for challenged actions participants were more likely to provide scores below the scale ceiling, and also were more likely to rate the item at the scale floor, as compared to Study 1. By this definition, 21.0% of challenges resulted in defense and 62.7% resulted in relinquishment of belief.

Fig. 4
figure 4

Proportions of ratings at each level of the belief in occurrence scale, for challenged and nonchallenged performed statements. Postchallenge ratings are provided for challenged actions

The pre–post design permitted us to define defense as maintenance of or increase in belief and reduction as any decrease in belief, permitting the inclusion of all challenged performed actions in the calculation. The distribution of change scores is provided in Fig. 5. By this definition, 25.4% of the challenges resulted in defense, and 74.6% resulted in some degree of reduction. A small number of ratings increased (3.0%).

Fig. 5
figure 5

Changes in belief in occurrence scores for performed challenged actions (post minus pre score). Belief in occurrence was rated on a 7-point scale; hence, the largest change possible was six points

We calculated average ratings for the variables based on defended or reduced status (Table 7). For defended items, ratings were generally high. For reduced items, ratings of belief in occurrence and recollection were reliably below the midpoint of the scale, and ratings for visual, vividness, and reexperiencing were higher at about the scale midpoint. This indicates that when accepted, challenges had a greater impact on metacognitive appraisals of occurrence and recollection than on strength of mental simulation (vividness, visual detail).

Table 7 Study 2: Postmanipulation item ratings for reduced and defended actions

Do participants show tendencies to always defend or always reduce?

Using the Study 1 definitions, which labeled defense or reduction on the basis of postchallenge scores (Table 8, Column A), 12 individuals (14.1%) always defended, and 53 (62.4%) always relinquished belief when challenged. As in Study 1, in two further analyses we also assumed that the ambiguous cases indicated either defense or relinquishment. The results across the calculation methods indicated that somewhere between 6% and 18% of participants defended belief for all events, and that between 44% and 66% reduced belief for all events. When relinquishment was defined as any decrease in score (Study 2 definition, Table 8, Column D), 8.2% defended at all challenges, and 57.6% accepted all four challenges and thus relinquished belief to some degree.

Table 8 Study 2: Individual differences in memory reduction rates

Pattern of responding

Per the Study 1 definition, 11.8 defended at the first challenge, 68.2 reduced belief, and for 20.0% the first response fell into the ambiguous category. By the Study 2 definition, 14.1% defended and 85.9% reduced following the first challenge. Individuals who defended at the first challenge defended more at subsequent challenges (Mprop = .75 [.56, .91]) than did those who relinquished at the first challenge (Mprop = .21 [.14, .30]). The correlation between postmanipulation belief score and the number of subsequent actions defended was rho = .45. As in Study 1, the response to the first challenge predicted that individuals would make similar responses to subsequent challenges.

Nonbelieved memories (NBMs)

There were a notable number of instances in which recollection ratings were higher than belief in the occurrence ratings for challenged performed actions in Study 2. Hence, we examined whether NBMs resulted for some challenged actions, and explored whether the NBMs fit into the three subtypes discussed by Scoboria et al. (2017). To be categorized as an NBM, the Time 2 belief score had to be one or more points lower than the Time 2 recollection score; this criterion was met for 72 performed challenged actions and 10 performed control actions. The following definitions for the three subtypes of NBMs were based on these data. To be categorized as a “classic NBM,” characterized by high recollection and substantially reduced belief, the Time 2 belief needed to be low (<5 on the 7-point scale), and the Time 2 recollection high (>4), on the scale. To be categorized as a “grain-of-doubt NBM,” characterized by high recollection and slightly reduced belief, the Time 2 belief and recollection both needed to be high (>4). To be categorized as a “weak NBM,” characterized by lower recollection and belief, the Time 2 belief and recollection both needed to be low (<5).

For performed challenged actions, a total of 39 classic NBMs were produced by 15 participants (range 1 to 4); ten grain-of-doubt NBMs were produced by eight participants (range 0 to 2); and 23 weak NBMs were produced by 15 participants (range 0 to 2). Considering all NBMs to challenged actions, 31 participants (36.5% of the sample) produced a total of 72 NBMs (mean = .85, range 0 to 4). The average item ratings by NBM type are provided in Table 9. Classic NBMs were characterized by small but statistically meaningful decreases in recollection, visual, and reexperiencing ratings. Grain-of-doubt NBMs were characterized only by a decrease in vividness ratings. Weak NBMs were characterized by lower initial belief, recollection, visual, and reexperiencing ratings than were classic NBMs, as well as a statistically larger decrease in recollection and numerically (but not statistically) greater decreases in visual, vividness, and reexperiencing ratings.

Table 9 Study 2: Ratings of nonbelieved memories (NBMs) for challenged performed actions by NBM subtype

For performed control actions, there were no classic, nine grain-of-doubt (from nine participants), and one weak NBM; 50% of the participants with a grain-of-doubt NBMs for a performed control action also produced a grain-of-doubt NBM to a performed challenged action.

General discussion

These studies confirmed that challenging vivid memories for performed actions can lead to decreases in belief in occurrence, recollection, mental simulation, and reexperiencing ratings, on average. This effect was larger for occurrence than for characteristics associated with recollection (visual detail, vividness, reexperiencing), indicating that overarching occurrence appraisals are more susceptible to revision than are the more basic component processes that are thought to underlie recollective experiences (see Cabeza & Moskovitch, 2013; Rubin, 2006, for more on component processes). Our results also clarify that challenging correct memories affects occurrence ratings to a greater extent than recollection ratings when occurrence and recollection ratings are examined in a pre–post manner, replicating prior findings (Clark et al., 2012; Mazzoni et al., 2014).

We emphasize that limiting the analyses to the examination of average performance obscures the fact that participants respond to challenges sometimes by defending and sometimes by reducing belief. In Study 1, in which the definition of defense versus relinquishment of belief was based on the distribution of responses and challenges occurred immediately during the memory test, defense was more likely. When a pre–post design was used, and the definition of defense versus relinquishment was based on a change in ratings and challenges occurred after the memory test was complete (Study 2), belief reduction was more likely. This difference cannot be attributed to the difference in the ways that defense and relinquishment were defined in the two studies. A major difference in the procedures between the two studies was that in Study 1 the challenges were encountered as participants moved through the test, immediately after judging the source of items and prior to rating the challenged memories (on belief, recollection, vividness, visual detail, and reexperiencing). The challenged memories had been very recently retrieved and had strong associated recollective evidence available. This may have led to greater resistance to the challenges. In Study 2 the items were initially rated during the test but not challenged until after the test had been completed. This means that the memories were not challenged until well after initial retrieval, and after numerous items had been retrieved. The delay and intervening ratings between the initial retrieval and rating and the challenge may have reduced the availability of internal evidence about the challenged items and made belief more amenable to revision.

Our studies reflect a type of social interaction between the participant and the researcher, both of whom observed a series of events in which feedback about the quality of recall was provided by one to the other. This presents some similarities to other conditions in which social interactions influence memory reports. In co-witness situations, for example, two people witness an event and then later recall the event together (Gabbert, Memon, Allan, & Wright, 2004), and in memory conformity studies information provided by a source is known to influence the content of the memory report (Wright, Self, & Justice, 2000). Discussions following the encoding of information can produce notable distortion and conformity of memory reports provided alone at a later time (Gabbart, Memon, & Allan, 2003). An important potential outcome when witnesses discuss memories for co-witnessed events is that subsequent accounts provided by each become more similar, and hence appear to be more corroborative (Hope, Ost, Gabbert, Healey, & Lenton, 2008). A variety of social factors have been shown to influence the outcomes of co-witness interactions and memory conformity studies. The literature on memory conformity has focused on the manners by which information provided by others can affect memory for the details of shared events. Studies have also shown that social power in relationships (Skagerberg & Wright, 2008), the person who initiates the interaction, and the perceived credibility, trustworthiness, and accuracy of the other party can influence the degree to which co-witness information is incorporated into memory reports (Kwong See, Hoffman, & Wood, 2001). Witnesses tend to conform to the co-witness who shows higher confidence (Wright, Self, & Justice, 2000) and to co-witnesses who are perceived as having encoded the original event more effectively (Gabbert, Memon, & Wright, 2006).

The relationship between such social factors and memory defense and reduction is not yet established. Presumably in our studies the credibility of the experimenter was fairly high and the particular feedback provided was perceived as trustworthy, factors that might have led to the observed drops in belief and recollection. However, it is premature to try to explain why for some individuals the effect of these and similar social factors is strong, and for others they have less, if any, influence. Measuring directly perceptions of credibility of the experimenter and the feedback itself, as well as individual characteristics, might provide additional important information about the impact of social feedback on belief and recollection judgments. Future studies could use manipulations of social variables such as credibility, social pressure, and so forth, to explore additional effects of memorial feedback on belief in occurrence and recollection, along with different cognitive/social/personality characteristics. As has recently been demonstrated, conceiving the effect of social feedback on memory in terms of persuasion might prove useful as well (Nash, Wheeler, & Hope, 2015).

Such factors may prove valuable in situations when the quality of memory reports are evaluated, such as during forensic proceedings, and might shed further light on debated topics such as false confessions and retractors. As a side note, the procedure in Study 2 also permitted us to observe instances in which belief ratings increased, rather than decreased, following challenges. Although this occurred infrequently (largely because belief ratings tended to be at the scale ceiling prior to the challenge, leaving no room for increase), it shows that the cognitive processing that goes into resolving the discrepancy between memory and disconfirmatory feedback may also result in a stronger sense of belief in cases of memory defense. Future research might consider ways of addressing this ceiling effect, perhaps by querying participants as to whether they experience a sense of strengthened or weakened belief.

We note that our between-study comparisons are best thought of as tentative, because the participants were not randomized across studies. However, the fact that the encoding portions of the studies were identical and the corresponding results were quite similar is a strength when comparing the studies.

The main differences between the studies was the presence (Study 2) or absence (Study 1) of pre–post measurement, which also impacted the timing of the challenges. In Study 1, the challenges occurred during the recognition test at the time the items were presented, after participants had made the recognition and source-monitoring judgments and just prior to taking the postchallenge item ratings. In Study 2, the challenges occurred after the initial (prechallenge) item ratings and once the entire (fairly long) recognition and source-monitoring test was complete, and postchallenge ratings were taken after a longer delay shortly after each item was challenged. Future research might further examine the effects of different periods of delay between the recognition test, prechallenge, and postchallenge ratings. This would help determine whether decisions made on the initial recognition test, or when initially rating items, serve to anchor subsequent ratings. In other words, the results of Study 2 may partly reflect anchoring effects of pretest measures of belief and recollection. However, if anchoring effects had played a large role in Study 2, we would have expected to find higher memory defense than in Study 1. However, we actually found the opposite. Although this could also be the result of other methodological differences between the studies, the opposite findings imply that the preratings are unlikely to have served as anchors in Study 2.

We chose to focus the challenges about memory on correctly recalled performed actions in Study 2. Examining other items (such as imagined/performed [false memories]) in future studies might be interesting, in order to better understand the impact of social feedback on the creation of false memories as well as the consistency of other initial memorial statements. One goal of these studies was also to learn more about individual differences in the tendency to defend or reduce belief in response to social challenge. The findings confirm the existence of individual differences, with some participants always defending and others always reducing their belief, and still others showing a mixture of defense and reduction. Interestingly, in both studies the response to the initial challenge predicted responding to the subsequent challenges—defending at the first challenge was associated with greater likelihood of defending at the next three challenges, and likewise reducing after the first challenge was associated with a greater likelihood of reducing after the subsequent challenges. Variables that may predict these tendencies can be measured in future studies (e.g., compliance, memory distrust, or trait submissiveness).

The present studies also provided participants with a compelling face-saving explanation for their alleged memory errors—the feedback was accompanied by the explanation that the item was “imagined and not performed.” Hence, the errors were explained as errors in source monitoring, and the memory challenges thus encouraged participants to reattribute the source of the memory. Given the large numbers of items that were performed, imagined, and heard, as well as the length of the procedure, misattributing sources seems likely to have been a credible explanation for errors, and participants may not have been particularly surprised at being told that they had made source-monitoring errors. The source-monitoring framework (Johnson, Hashtroudi, & Lindsay, 1993) remains a suitable theoretical foundation for studying social effects on memory (Leding, 2012; Nash et al., 2015).

The relationship between such social factors and the individual propensity to defend or reduce belief in the occurrence for memories has yet to be studied. Individual perceptions of the credibility of the experimenter and the feedback itself might be measured directly in future. The feedback provided was generally plausible (although the plausibility of the feedback was not measured directly), whereas more unusual or “bizarre” feedback might be more likely to reveal that misinformation is being provided (Thomas & Loftus, 2002) and might undermine trust in the messenger and/or the message. Future studies could incorporate existing manipulations of credibility and social pressure from the literature in order to explore additional effects of memorial feedback on belief in occurrence and recollection. Nash, Wheeler, and Hope (2015) discussed changes to belief in the occurrence of remembered events in response to social feedback in terms of “persuasion” and “attitude change.” They also discussed the need to better draw together “cognitive” and “social” explanations when attempting to understand memory. In addition, they pointed out parallels between the source-monitoring framework (Johnson et al., 1993) and models of persuasion and attitude change.

The social pressures to comply with the feedback in these studies were likely fairly strong. The material that was being remembered was not personally important, so it remains questionable whether it mattered to participants whether or not they were correct. To our knowledge, no research has measured how participants in similar studies have reacted when receiving feedback about their recall. Anecdotally, we noted that some participants in these studies appeared to be personally bothered when receiving the (erroneous) feedback (but we did not collect any systematic data on this point). Such findings may prove valuable in situations in which the quality of memory reports is evaluated, such as during forensic proceedings. They indicate that it may eventually be possible to examine whether feedback provided by other people (e.g., investigators, co-witnesses) about remembered information may influence belief in the occurrence of past events. These findings coincide with arguments that belief in occurrence is sensitive to social feedback (Scoboria et al., 2014).

Extended investigation of NBMs

Study 2 provides new insights into the creation of NBMs in the laboratory. This was the first experimental study to produce the three subtypes of naturally occurring NBMs identified by Scoboria et al. (2017). Their study was based on retrospective ratings of long-term autobiographical NBMs, whereas this study was the first opportunity to experimentally address questions about how events are rated prior to developing into NBMs. Hence, these results provide evidence about prechallenge factors that differentiate the subtypes of NBMs.

The findings of Study 2 confirmed that a variety of types of NBMs result when memories are challenged, which was already observed in Scoboria et al. (2017). The classic NBM was the most frequently observed subtype. These memories were initially rated on average as being strongly believed, strongly recollected, and associated with strong mental simulation and reexperiencing. The challenges resulted in notable relinquishment of belief in occurrence, with slight decreases in recollection and recollective features. The reason why people decide to relinquish the belief in such strong memory-like mental simulations remains an interesting and still rather scarcely explored area.

The weak NBM subtype was the second most frequent. As we had previously speculated, these were somewhat “weaker” memories prior to the challenge. Ratings for all items (visual, vivid, reexperience) were already relatively low prior to the challenge, and statistically lower than for the classic NBMs. Furthermore, all ratings decreased substantially following feedback (recollection to a statistically greater extent, and numerically for the other items). It seems possible that the relatively weaker memorial experience associated with these items resulted in a broader downgrading of all ratings when the memory was challenged. This explanation remains tentative, due to overlap in confidence intervals between the different subtypes of NBMs. Research that uses this procedure with larger samples might help make more definitive statements about the NBM subtypes.

The grain-of-doubt NBM was observed least frequently. Unlike the other two types of NBMs, which occurred almost exclusively for challenged items, this subtype appeared equally for challenged (ten) and control (nine) items, and half of the participants who produced this type of NBM to a challenged item also produced one to a control item. These were memories that had started with high ratings. One possibility is that these items reflect some degree of normative fluctuation in ratings. However, the visual and reexperiencing ratings did not decrease statistically, suggesting that for these items a strong mental simulation might have continued to bolster recollection, and the contradiction between the feedback and the vivid image resulted in slightly reduced belief in the memory’s occurrence (the “grain of doubt”). Another possibility is that grain-of-doubt profiles may occur when memories are appraised and reappraised, and some individuals might have a tendency to produce this type of profile when rating memories. Not much evidence helps to differentiate classic and grain-of-doubt NBMs. The main difference here was that classic NBMs exhibited a small but statistically meaningful decrease in all mental simulation and recollection variables, whereas this was the case for only one item (vivid) for the grain subtype. This suggests that the degree to which the strength of mental representations is amenable to revision may differentiate these subtypes. Scoboria, Nash, and Mazzoni (2017) discussed manipulations that might differentially produce the different NBM subtypes. Given the fairly small number of the “grain” subtype, we leave further exploration of these issues to subsequent research.

Of course, there are obvious differences between memory for simple actions performed in the lab and the types of elaborate autobiographical experiences that people describe when considering naturally occurring NBMs. The present findings suggest that NBMs for autobiographical events may be the result of more basic processes that influence the degree to which belief is affected by disconfirmatory feedback. Future research can seek to create conditions in which experimentally controlled events may resemble rich autobiographical memories to a greater extent.