People frequently make errors during learning or when they attempt to retrieve information from memory. Upon receiving feedback, however, they are often quite good at correcting those mistakes. The power of feedback has been demonstrated in many domains, including learning English translations of foreign language words (Pashler, Cepeda, Wixted, & Rohrer, 2005), definitions of English vocabulary (Metcalfe & Kornell, 2007), and science concepts such as the respiratory system (Butler, Godbole, & Marsh, 2013), brain regions (Lantz & Stawiski, 2014), and the solar system (Little & Bjork, 2014). Feedback is one of the most effective tools in the teacher’s toolbox (effect size: d = 0.73; Hattie, 2009); in one study, providing feedback after incorrect translations increased final retention by 494% (Pashler et al., 2005).

Other errors, however, are not so easily corrected. In particular, people often misremember the details of events, or even falsely remember entire events that never occurred. It is notoriously difficult to avoid and correct such false memories. For example, hearing a list of semantically related words like bed, rest, and tired yields later claims that a nonpresented word, for instance sleep, was also on the list (the Deese/Roediger–McDermott [DRM] illusion; Roediger & McDermott, 1995). People misremember sentences like The new baby stayed awake all night as The new baby cried all night (Brewer, 1977). Answering leading questions like How fast were the cars going when they smashed into each other? evokes memories of (nonexistent) broken glass at the scene of an accident (Loftus & Palmer, 1974). As compared to other memory errors, false memories are often associated with vivid (but inaccurate) experiences of remembering, or the feeling that one recollects specific details of the event (Chan & McDermott, 2006; Roediger & McDermott, 1995). Two common attempts to correct these errors involve specifically warning participants that an activity can yield false memories (e.g., Gallo, Roberts, & Seamon, 1997; McDermott & Roediger, 1998) and allowing multiple encoding opportunities of the to-be-remembered information before a memory test is given (e.g., McDermott & Chan, 2006; Watson, McDermott, & Balota, 2004). Unfortunately, neither method is particularly effective. A strong warning combined with a practice list and a full explanation of the DRM illusion still results in false recognition of nearly half of the critical lures (Gallo et al., 1997), and false recall of almost one-third of them (Watson et al., 2004). After three encoding opportunities of pragmatic inferences (e.g., The new baby stayed awake all night), learners still “recognize” the inference (false memory) answer on 28% of the final test trials (McDermott & Chan, 2006).

Why is it so hard to correct false memories, when it appears relatively simple to correct mistranslations of foreign words (Pashler et al., 2005), definitions of vocabulary words (Metcalfe & Kornell, 2007), and facts about science (Butler, Fazio, & Marsh, 2011)? We believe two factors are key:

  1. 1.

    The learner needs to realize that a mistake has been made.

Our argument is that learners must first notice their errors in order to correct them. This requirement is simple in many cases, such as when the learner is aware that he or she has no idea of the answer (e.g., you likely know if you don’t know the translation of the Luganda word leero). However, almost by definition, false memories mean that learners are unaware of their mistakes—such memories are accompanied by confidence and the subjective (but false) experience of recalling sounds, feelings, or other experiences from the original event (Chan & McDermott, 2006; Roediger & McDermott, 1995). The vividness of these errors may make the learner resistant to feedback, similar to the case in which two people both claim a memory as their own, despite knowing that the event could only have happened to one of them (disputed memories; Sheen, Kemp, & Rubin, 2001). In addition, most feedback about false memories is not as explicit as someone else telling you that a memory is theirs (and not yours). One of the most common approaches is to give the learner multiple study–test trials; however, success requires noticing that one’s intrusion was not actually on the list (e.g., Kensinger & Schacter, 1999; McDermott, 1996) or in the passage (Fritz, Morris, Bjork, Gelman, & Wickens, 2000; Kay, 1955). Learners establish a schema for the event, making it difficult to notice and correct memories that are schema-consistent.

The notion that learners may fail to notice their false memories, even when confronted again with the correct information, leads to our recommendation that a successful correction procedure must first draw attention to learners’ errors. This may be accomplished in several ways; perhaps the most straightforward approach is to present corrective feedback immediately after each error is committed (i.e., on a trial-by-trial basis). Essentially, we are proposing to tell the learner “no, sleep wasn’t on the list; it was bed” as soon as sleep is falsely recalled. Note that this prediction—that immediate feedback should best facilitate the correction of false memories—is in contrast to other findings in the broader feedback literature, in which feedback administered after a brief delay often yields improved performance, presumably because the delayed feedback serves as a spaced study trial (Butler, Karpicke, & Roediger, 2007; see Pashler, Rohrer, Cepeda, & Carpenter, 2007, for a review of the benefits of spacing practice over time). Importantly, however, these studies did not involve false memories; instead, they corrected errors in general knowledge (Smith & Kimball, 2010), history (Butler & Roediger, 2008), and engineering (Mullet, Butler, Verdin, von Borries, & Marsh, 2014) concepts. In Experiments 1 and 2, we evaluated the benefits of immediate versus delayed feedback for correcting false memories.

Of course, the provision of immediate, trial-by-trial feedback is not the only way to draw learners’ attention to their errors. In the present Experiment 2, we examine a second way to accomplish the same goal: explicitly asking learners to evaluate their past responses at the time that delayed feedback is presented (i.e., by asking, Was your [previous] answer correct?). Regardless of the specific procedural details, any situation that encourages learners to notice the discrepancies between their errors and the correct information should result in a reduction of false memories.

  1. 2.

    It is not enough to know that one was wrong; learners also need to know the correct information.

Unfortunately, although noticing that one has made a mistake is necessary for error correction, it is not sufficient. Requiring participants in DRM experiments to mark each error with an “X” while listening to the correct list of words read aloud yields a “depressingly high” (p. 1005) rate of error persistence (30%–50% of errors are recalled again later; McConnell & Hunt, 2007). The problem is that although this procedure met our first criterion for correcting false memories (by explicitly labeling errors), it did not tell learners what correct answer they should have provided instead. However, much research with other materials, including facts and foreign language translations, has shown that merely providing learners with correct/incorrect feedback does little to help them correct their errors, and sometimes is not any better than no feedback at all (e.g., Fazio, Huelser, Johnson, & Marsh, 2010; Pashler et al., 2005). In contrast, feedback messages that expose participants to the correct answers are much more likely to promote successful error correction (e.g., Bangert-Drowns, Kulik, Kulik, & Morgan, 1991; Fazio et al., 2010; Shute, 2008).

In this case, we believe that false memories are no different from other types of errors: For optimal error correction, learners must not only be told that they have made a mistake, but also what the correct answer was. Indeed, related research on the continued-influence effect (Lewandowsky, Ecker, Seifert, Schwarz, & Cook, 2012) has shown that learners continue to rely on an initially presented but false explanation for an event (e.g., “gas cylinders and oil paints caused the warehouse fire”), even after that explanation has been retracted (similar to being told that an answer was “wrong” in the typical false memory paradigms described above). Strengthening the wording of the retraction (“paint and gas were never on the premises”) does not help, but ironically actually backfires to increase later reliance on the erroneous information. The most effective solution is to present an alternative account for why the event occurred (e.g., “arson materials were found at the scene”). Similarly, we predicted that supplying learners with correct information with which to replace their errors is a critical step in correcting false memories, and we tested this assumption in Experiment 3.

Our approach

In short, we believe that successful correction of false memories requires (1) drawing learners’ attention to the specific errors they have made and (2) providing them with the correct answers with which to replace their mistakes. Across three experiments, we provide evidence for both of these requirements, using pragmatic inference materials whereby sentences such as The karate champion hit the cinderblock are misremembered as The karate champion broke the cinderblock. Although we did not directly measure confidence in the experiments that follow, other work has shown that the remembered inferences are often accompanied by phenomenological experiences “indistinguishable from those of true memories” (Chan & McDermott, 2006, p. 633), including high confidence in one’s wrong responses (Sampaio & Brewer, 2009).

Experiment 1

Many studies with educational materials have shown that long-term retention is enhanced when learners receive feedback after a delay, rather than immediately after each item (e.g., Anderson, Kulhavy, & Andre, 1971; Butler & Roediger, 2008; Carpenter & Vul, 2011; Metcalfe, Kornell, & Finn, 2009; Mullet et al., 2014; Phye & Andre, 1989; Sassenrath & Yonge, 1968; Smith & Kimball, 2010). Interestingly, however, delaying feedback is not always beneficial; in particular, this advantage disappears when correcting high-confidence errors in general knowledge (e.g., Sydney is the capital of Australia or Vitamin C cures colds; Sitzman, Rhodes, & Tauber, 2014). In the case of these high-confidence errors, delaying feedback likely reduces the chance of the learner noticing the contradiction between the feedback and their prior mistake. We predicted that, similar to strongly held errors in general knowledge (Sitzman et al., 2014), false memories are another class of errors that would not benefit from delaying feedback. As we described earlier, false memories differ from most errors in that they are vivid, held with confidence, and involve thinking back to a particular time and place—factors that likely make it difficult to notice the contradiction between one’s error and the delayed feedback. The purpose of Experiment 1 was to test the prediction that immediate, trial-by-trial feedback is more effective for correcting false memories, because it enables a back-to-back comparison between the correct response and one’s error.

Method

Participants and design

Seventy-eight Duke University undergraduates participated in exchange for course credit (n = 26 for the immediate and n = 26 for each of two delayed feedback conditions; see the Procedure section for a full explanation of these groups).

Materials

We pilot tested McDermott and Chan’s (2006) materials and identified 30 items for use in our experiments (e.g., The ugly stepsisters asked Cinderella to mop the floor, often falsely remembered as The ugly stepsister told Cinderella to mop the floor). Twelve additional sentences without pragmatic implications (e.g., The boy slipped on the banana peel) were created to make the task similar in length to that of McDermott and Chan. For each critical and filler item, we created a sentence fragment (e.g., The ugly stepsisters _____ Cinderella to mop the floor) to use on the initial and final tests. On average, 1.8 words were needed to complete each blank.

Procedure

The experiment was programmed with E-Prime 2.0 software. During study, participants were instructed to read and remember the sentences. They read the 42 sentences (30 critical sentences and 12 fillers) at a rate of 4 s each, with a 500-ms blank screen and a 1-s fixation point between trials. Participants then solved unrelated brainteasers for 10 min.

Next, participants completed the self-paced initial test. For each sentence fragment, they were instructed to fill in the missing word(s), being careful to use the exact wording from the sentence that they had studied. Participants were told not to guess, and they were asked to enter “I don’t know” if they could not remember the critical word(s).

For all participants, the feedback took the form of a 4-s re-presentation of the originally studied (correct) sentence (i.e., correct answer feedback). See Fig. 1 for a schematic of the design. For the immediate feedback condition, the feedback was presented immediately after each initial test trial. Participants in the immediate feedback condition then solved unrelated brainteasers for 15 min before beginning the final test. The delayed feedback was administered according to one of two schedules. In both schedules, participants completed the initial test (without viewing any feedback), solved unrelated brainteasers for 5 min, and then received the feedback, which was essentially another chance to see the study list. What differed across the schedules was the lag between the feedback presentation and the final test (10 min of brainteasers in the first schedule and 15 min of brainteasers in the second schedule). Thus, one delayed feedback condition equated the lag between the initial and final tests across feedback groups; in the other delayed feedback condition, the lag between the presentation of the feedback and the final test was equated. Both schedules were necessary to ensure that a possible advantage of delaying feedback was not due to the delayed feedback being presented closer in time to the final test (Metcalfe, Kornell, & Finn, 2009).

Fig. 1
figure 1

A schematic of the design for the immediate feedback (IFB) condition and the two delayed feedback (DFB) conditions in Experiments 1 and 2. The first delayed feedback condition controlled for the lag between study and final test, whereas the second delayed feedback condition controlled for the lag between feedback and the final test

Participants then completed the final test, which was exactly the same as the initial test except that none of the participants received feedback.

Results and discussion

Data scoring and analysis

Participants’ responses were coded as correct, inference, “I don’t know,” or another wrong answer. Consistent with past research (McDermott & Chan, 2006), we identified a list of a priori responses that would be defined as correct or inference answers. For example, for the sentence The ugly stepsisters asked Cinderella to mop the floor, asked was classified as correct, and told, ordered, and forced were classified as inferences. Other responses that had not been defined a priori were coded as other wrong answers. Two independent coders scored the responses (Cohen’s kappa = .95), and a third coder resolved discrepancies. The results are presented in Table 1; note that changes in one response category across tests necessarily produce changes in the other categories (e.g., an increase in correct responses necessarily coincides with a decrease in incorrect answers). Because the two delayed feedback schedules resulted in virtually identical performance on both the initial and final tests, we collapsed across them to form one delayed feedback condition for the reporting of the statistical analyses here (but the interested reader can find the complete breakdown of means in Table 1). To examine the relative effectiveness of immediate and delayed feedback, we examined the proportions of final-test items completed correctly versus with the critical inferences in separate 2 (Test: initial, final) × 2 (Feedback Timing: immediate, delayed) analyses of variance (ANOVAs).

Table 1 Proportions of sentence fragments answered correctly versus with inferences or other wrong answers for the immediate feedback condition and the two delayed feedback conditions of Experiment 1

Correct answers

Participants who received immediate feedback produced more correct answers initially (M = .24) than those who received delayed feedback (M = .19), F(1, 76) = 6.83, p = .01, MSE = .033, η 2 = .001. Although the advantage of the immediate feedback condition became numerically larger on the final test (M = .79 for the immediate vs. .68 for the delayed feedback condition), the Test × Feedback Timing Condition interaction was not significant, F(1, 76) = 2.22, p = .14, MSE = .01, η 2 = .003.

Inferences

Our primary interest was in the intrusion of inferences when recalling the original sentences. Initially, participants produced the critical inferences on about one-third of trials, and this rate did not differ across the immediate (M = .34) and delayed (M = .33) feedback conditions. Feedback was very helpful; overall, the proportion of sentence fragments completed with the critical errors was reduced close to floor levels (M = .07) on the final test. However, feedback timing mattered: Participants in the delayed feedback conditions used the inferences to complete 9% of the final sentence fragments, whereas participants receiving immediate feedback only completed fragments with the inferences 4% of the time [F(1, 76) = 4.67, p = .03, MSE = .007, η 2 = .01 for the Test × Feedback Timing interaction]. This pattern emerged even though the timing of the feedback manipulation was relatively subtle, with delayed feedback being administered only 5 min later than immediate feedback. Both immediate and delayed feedback greatly reduced the proportion of false memories produced across tests, but immediate feedback was more effective.

Experiment 2

As we predicted—but in contrast to the patterns often observed with other types of errors—immediate feedback was superior to delayed feedback in reducing the number of false memories. In addition to replicating Experiment 1, the central goal of Experiment 2 was to more clearly evaluate the reason for the advantage of immediate feedback, as well as to examine whether performance under delayed feedback could be improved to the same level.Footnote 1 Specifically, we tested the idea that delayed feedback might be just as effective as immediate feedback if learners were explicitly required to compare the feedback messages to their prior responses. Broadly speaking, previous research has already shown that noticing discrepancies is critical for error correction. For example, participants who are told to replace their memories of a previously studied cue–target word pair (e.g., knee–bone) with an updated pair (e.g., knee–bend) are more successful at doing so if, at the time of encoding the second pair, they notice that the target word has changed (i.e., notice the discrepancy between the two pairs; Wahlheim & Jacoby, 2013). Moreover, a comparison of the data from Wahlheim and Jacoby’s Experiments 1 and 2 suggests that an explicit requirement to look for changes across trials may increase the likelihood of noticing such discrepancies. In line with this idea, in Experiment 2 we manipulated whether learners were explicitly prompted to compare each feedback message to their (past) response (thereby helping them notice any discrepancies). Directly after receiving the immediate or delayed feedback, half of the learners were required to answer the question “Was your [previous] answer correct?”—a judgment that should encourage them to bring their previous response to mind while viewing the feedback. This design allowed us to replicate the surprising benefit of immediate feedback from Experiment 1, as well as to examine whether another manipulation could successfully promote comparisons between one’s errors and the correct information.

Participants and design

Participants were workers on Amazon’s Mechanical Turk (MTurk), an online marketplace where people complete tasks in exchange for payment. Our records from Qualtrics survey software—which was used to present the experiment—showed that 315 workers clicked on the experiment link, with many of them ultimately deciding not to complete the experiment (of those who quit, the vast majority [86%] did so when they had progressed through less than 30% of the survey). We continued collecting data on MTurk until we had reached our goal of 30 participants in each of the six conditions who had completed the full experiment (180 participants in total).

Materials

The materials were the same as in Experiment 1, but the experiment was presented using Qualtrics survey software.

Procedure

As in Experiment 1, each participant received either immediate or delayed feedback, and the delayed feedback was administered according to one of two schedules (see Fig. 1). Upon receiving the feedback message, half of the learners were asked “Was your answer correct?” and responded either “yes” or “no” on each trial. This question was meant to ensure that learners would make a direct comparison between their own answers and the feedback message.

Results and discussion

Data scoring and analysis

Scoring was the same as in Experiment 1 (Cohen’s kappa = .96); the means are shown in Table 2. The proportions of correct and inference answers were included in separate 2 (Test: initial, final) × 2 (Feedback Timing: immediate, delayed) × 2 (Presence of Follow-Up Question: yes, no) ANOVAs. As a manipulation check, we note that participants in all three conditions with the follow-up question were very good at judging whether the feedback matched their answers (M = .91).

Table 2 Proportions of sentence fragments answered correctly versus with inferences or other wrong answers in Experiment 2, as a function of (1) whether participants received immediate feedback or one of the two delayed feedback schedules and (2) whether the feedback message was paired with a follow-up question

Correct answers

Overall, participants correctly answered about 19% of the initial test trials; after receiving feedback, correct responding increased dramatically (M = .57 on the final test). We found no main effect of feedback timing, F(1, 176) = 1.07, p = .30, MSE = .09, η 2 = .004, and no main effect of the presence of the follow-up question, F < 1. There was, however, a significant Test × Follow-Up Question interaction, F(1, 176) = 5.23, p = .02, MSE = .02, η 2 = .001; the increase in correct responding across tests was greater when participants were required to answer the follow-up question (increasing from .17 to .59) than when they were not (a smaller increase, from .21 to .55).

Inferences

Again, we were most interested in the production and subsequent reduction of false memories. As can be seen in Table 2, there was some baseline variability across the conditions, possibly due to the less controlled conditions involved when testing MTurk participants; what is more important is the reduction in the rate of false memories across tests. Replicating Experiment 1, we observed a significant Test × Feedback Timing interaction, F(1, 176) = 12.83, p < .001, MSE = .012, η 2 = .06, with the reduction in errors across tests depending on whether participants had received immediate or delayed feedback. Whereas the inference rate decreased by 28 percentage points in the immediate feedback condition, it only dropped by 18 points after delayed feedback. These data replicated the main finding of Experiment 1.

Critically, we also found a significant Test × Follow-Up Question interaction, F(1, 176) = 12.35, p = .001, MSE = .012, η 2 = .06; the reduction in errors varied depending on whether learners had been required to answer the follow-up question (“Was your answer correct?”) when they received feedback. Learners produced slightly more errors when required to answer the follow-up question on the initial test (M = .35 vs. .33 for the control), but the rate of errors dropped to only 10% when they were explicitly prompted to compare their responses to the feedback, as opposed to a final inference rate of 15% for the control condition.

Although the three-way interaction of test, feedback timing, and presence of a follow-up question did not reach significance [F(1, 176) = 2.00, p = .16, MSE = .012, η 2 = .01], Table 2 suggests that reminding participants to explicitly compare the feedback to their responses was effective at mitigating the negative effects of delayed feedback. To confirm this impression, we conducted additional analyses on the data from the delayed feedback conditions. We directly compared the proportions of questions answered with inferences as a function of whether or not participants had received the follow-up question. There was no difference in the initial rate of inferences as a function of whether the delayed feedback was paired with the follow-up question (t < 1); however, participants who received delayed feedback with the follow-up question responded with fewer inferences on the final test (M = .10) than did those who received delayed feedback without the follow-up question (M = .18), t(118) = 3.16, p = .002, d = 0.58. The addition of the follow-up question—which encouraged a direct comparison between one’s errors and the correct answers—helped learners avoid reproducing their mistakes, even when they received delayed feedback.

Experiment 3

In Experiments 1 and 2, participants benefited from correction conditions that encouraged them to notice discrepancies between their errors and the correct information, either through trial-by-trial feedback or a specific prompt to compare one’s previous answer to the feedback message. As we described earlier, however, noticing one’s error is not the only important step in eliminating false memories. In addition to knowing that they were wrong, learners must also know what the correct answer is—information that was always provided by the feedback messages in the first two experiments. The purpose of Experiment 3 was to directly test the assumption that the feedback must provide information with which to replace one’s errors, by comparing the effectiveness of three different immediate feedback procedures: (1) correct answer feedback (as had been presented in Experiments 1 and 2), (2) correct/incorrect feedback that only told learners when they had made a mistake, and (3) no feedback.

Method

Participants and design

Fifty-seven Duke undergraduates participated in exchange for course credit (n = 19 for each of the no-feedback, correct/incorrect feedback, and correct answer feedback groups).

Materials

The materials were the same as in Experiments 1 and 2. As in Experiment 1, the experiment was presented using E-Prime 2.0 software.

Procedure

The procedure was similar to those of Experiments 1 and 2, with a couple of differences. First, if participants could not remember the critical word(s) on the initial or final test, they were asked to enter a plausible guess.Footnote 2 Second, participants always received feedback immediately after each trial, but the content of this feedback message differed across conditions. Specifically, participants viewed either a blank screen (no-feedback condition), a statement that their response was “Correct!” or “Incorrect” (correct/incorrect feedback condition), or the original studied sentence (correct answer feedback condition) for 4 s. After the initial test, and before completing the final test, participants solved unrelated brainteasers for 10 min.

Results and discussion

Data scoring and analysis

Again, two independent coders scored the responses (Cohen’s kappa = .93), and a third coder resolved discrepancies. The computer scored the initial test responses in the correct/incorrect feedback condition (with 98% accuracy). The proportions of correct and inference response were included in separate 2 (Test: initial, final) × 3 (Feedback Type: no feedback, correct/incorrect, correct answer) ANOVAs.

Correct answers

As in the prior experiments, participants correctly completed a small proportion of the sentence fragments on the initial test (M = .21). We observed no differences across the three feedback conditions.

Feedback dramatically improved performance across tests, as was reflected in a significant interaction between test and feedback type, F(2, 54) = 302.00, p < .001, η 2 = .54. The no-feedback group showed no improvement from the initial to the final test, t(18) = 1.28, p = .21, d = 0.31. In contrast, the correct/incorrect group gained 7 percentage points, t(18) = 5.40, p < .001, d = 1.20. Improvement was much more impressive in the correct answer condition, t(18) = 22.05, p < .001, d = 5.10; after responding correctly on only 22% of the initial test trials, these participants produced the correct sentences 80% of the time on the final test.Footnote 3

Inferences

Participants initially produced inferences on nearly half of the trials (M = .47); these error rates were almost identical across the three feedback conditions (see Table 3). The somewhat higher rate of inferences in this experiment than in Experiments 1 and 2 likely reflects the instructions used in this experiment.

Table 3 Proportions of sentence fragments answered correctly versus with inferences or other wrong answers correct responses, inferences, and other wrong answers for the correct answer feedback, correct/incorrect feedback, and no-feedback conditions of Experiment 3

Most importantly, inference rates on the final test were clearly influenced by the type of feedback, as was shown in a significant interaction between test and feedback condition, F(2, 54) = 79.26, p < .001, η 2 = .37. Participants who did not receive feedback produced about as many inferences on the final test as they had initially, t(18) = 1.38, p = .19, d = 0.33. In contrast, those who received correct/incorrect feedback completed fewer sentence fragments with the inferences, t(18) = 5.33, p < .001, d = 1.22. However, the magnitude of this decrease was much smaller than the decrease observed in the correct answer feedback condition, t(18) = 16.43, p < .001, d = 4.14; these participants rarely (M = .08) produced an inference on the final test, even though they had initially produced those answers about half of the time.

General discussion

False memories are notoriously difficult to prevent and correct, persisting despite warnings and multiple opportunities to study the correct information (Anastasi, Rhodes, & Burns, 2000; Gallo et al., 1997; Kensinger & Schacter, 1999; McDermott & Roediger, 1998; Neuschatz, Payne, Lampinen, & Toglia, 2001; Watson et al., 2004). Nonetheless, we found that the vast majority of these errors were easily eliminated when the correction procedure abided by two general principles.

First, learners needed to notice that they had made an error; errors were greatly reduced when learners received trial-by-trial feedback and/or a specific prompt to compare their own answers to the feedback message. In the absence of these sorts of conditions, we suspect that many false memories may go unnoticed, in part because these errors are held with high confidence and vividness. Moreover, the close semantic relationship between a false memory and the correct information likely also contributes to learners’ failures to notice their mistakes. That is, the false memory for sleep occurs precisely because this word shares a strong semantic relationship with the presented words; similarly, the false memory that The new baby cried all night is produced precisely because the correct version of the sentence was “designed to lead the listener to make schema-based inferences” (Sampaio & Brewer, 2009, p. 159). By definition, learners’ errors share close semantic overlap with the correct information, which likely makes these errors particularly difficult to notice. The literature on semantic illusions supports the same point: Learners are surprisingly willing to provide meaningful answers to nonsensical questions such as How many animals of each kind did Moses take on the ark?, as long as the errorful term (Moses) shares a close semantic overlap with the correct answer (Noah; Erickson & Mattson, 1981; van Oostendorp & de Mul, 1990).

It is likely that students are often more aware of their errors in educational contexts, both because the erroneous responses may be made with lower confidence and because the contrast between one’s error and the correct information is more obvious (e.g., it is easy to see that one has made an error when the correct answer to a math problem is 14, but one has produced the answer 6). Of course, however, there are exceptions to this general statement. For example, low-performing students are often thought to be “doubly cursed,” in that they lack both the knowledge to perform well on academic tasks and the awareness that they are poor performers (Kruger & Dunning, 1999). Thus far, this “unskilled but unaware” phenomenon has been primarily demonstrated in terms of global predictions of performance (e.g., low-performing students tend to overpredict their overall performance on an upcoming test); however, it is also possible that it would apply, on an item-by-item basis, to evaluations of previous correctness. In other words, low performers may struggle to detect discrepancies between their own (wrong) answers and corrective feedback, thus perpetuating their poor performance.

As we argued previously, however, noticing one’s error is only the first step toward error correction: Learners must also be provided with correct information in order to replace their mistakes. This conclusion has already been demonstrated many times in the broader literature on error correction (Bangert-Drowns et al., 1991; Fazio et al., 2010; Shute, 2008); here, we showed that the same rule applies for decreasing the production of false memories (see also Lewandowsky et al., 2012). Previous attempts to correct these strongly held errors often failed to abide by this principle, likely explaining the high rates of error persistence (e.g., McConnell & Hunt, 2007). The conclusion that effective feedback must go beyond identifying an error as “wrong” is also consistent with other findings that “forgetting” an old memory is more difficult than replacing it. For example, in the directed-forgetting literature, participants first encode some items (e.g., a list of words) and are then asked to intentionally forget that material and to learn some new information (e.g., a different list of words). Importantly, the encoding of List 2 is critical in facilitating the forgetting of List 1. In other words, after receiving the “forget” cue, participants must receive competing material to encode; otherwise, they are unable to forget the first list (Gelfand & Bjork, 1985; Pastötter & Bäuml, 2007, 2010).

The present results demonstrate that feedback can be effectively used to correct false memories, so long as learners receive correct information with which to replace their error. Moreover, such feedback is most useful when received directly after committing an error, or when the correction conditions otherwise facilitate a direct comparison between learners’ errors and the correct answers. Despite the surprising robustness of false memories in the face of other correction attempts, abiding by both of these recommendations enables learners to eliminate nearly all of their mistakes. However, one issue for future research involves the translation of this work from the laboratory to the real world. That is, it is less clear that it will always be possible to abide by both recommendations in correcting many types of real-world errors. For example, in contrast to translations of foreign words or definitions or obscure vocabulary, it is much less likely that an objective “truth” could be compared to one’s personal memory. That is, barring photographs or a video-recording, no dictionary or other reference volume can provide corrective feedback for false memories of one’s personal experiences. Moreover, one’s personal memories may engender a sense of reliving or may elicit high confidence beyond the level simulated in the present experiments, lessening people’s willingness to accept corrections.