Delay and déjà vu: Timing and repetition increase the power of false evidence

In 2011, the police shooting of a civilian led to large-scale rioting in London. People posted images on social networking sites that depicted mass destruction, and these images caused panic among London residents. Ultimately, some images turned out to be fakes (Flock, 2011). Many anecdotes show that sophisticated image-editing software can create compelling doctored images, and science shows that doctored images can induce wholly false autobiographical memories (e.g., Wade, Garry, Read, & Lindsay, 2002). In this article, we ask whether a brief delay in the timing of such false evidence or how many times the evidence is shown influences its striking effect.

According to the source-monitoring framework, we reconstruct past events using the information we have available in the present (Johnson, Hashtroudi, & Lindsay, 1993). Specifically, we decide whether our mental products—images, thoughts, or feelings—stem from genuine experience or from mental activity such as imagination, fantasy, or dreams. To make this decision, we usually evaluate our mental products on multiple dimensions (e.g., familiarity, consistency), automatically and without awareness. But sometimes these decisions go awry, and we decide that false autobiographical experiences are real. Mazzoni and Kirsch’s (2002) metacognitive model further posits that when we encounter suggestions that contradict our beliefs and memories, including false images, we reconsider the characteristics of our mental products. If our beliefs and memories do not meet the required criteria, we turn to the information in our environment—which may or may not be accurate—to confirm what happened.

Previous studies demonstrate that false evidence affects distant memories—such as memories of a childhood hot air balloon ride (Garry & Wade, 2005)—as well as recent events. Nash and Wade (2009), for example, showed people doctored videos of themselves cheating in a gambling task. False evidence led subjects to falsely confess to cheating, to believe that they had cheated, and to confabulate details about how they had cheated. Such studies show that people can generate a variety of rich false beliefs and memories that can influence their behavior. What is not known, however, is whether the timing of false evidence or how a combination of timing and repetition influences the effect of false evidence. To address this, we manipulated timing (Experiment 1) and timing and repetition (Experiment 2) using a novel procedure in which subjects were wrongly accused of cheating on a driving task.

Experiment 1

In Experiment 1, we accused subjects of cheating before exposing them to false video evidence immediately or after a brief (9-min) delay. Metacognition research suggests that subjects who encounter false evidence after a delay will be more likely to believe that they cheated and to confabulate details than will subjects who encounter false evidence immediately. Most laypeople believe that memory loss for an event is initially rapid and levels out over time—consistent with scientific research (Desmarais & Read, 2011). Presumably then, late-evidence subjects should be more likely to question their memories because time has passed since the target event. Accordingly, the metacognitive model (Mazzoni & Kirsch, 2002) predicts that late-evidence subjects should be more likely to reject their beliefs or memories than early-evidence subjects should be and turn to the video evidence for confirmation. Indeed, misinformation is generally more powerful when presented after a delay, close to the memory test, rather than immediately following an event (Frost, 2000; Loftus, Miller, & Burns, 1978).

However, unlike the delay periods used in previous studies, our delay was strikingly short, and subjects’ memories of the target event should remain strong until they encounter the false video. Perhaps subjects would not turn to the false video evidence to verify their beliefs. If so, late-evidence subjects should be no more likely than early-evidence subjects to develop false beliefs.

Method

Subjects

Seventy-five 18- to 50-year-olds from Warwick University participated in a hazard perception driving test for credit or £3. We randomly allocated subjects to the control (n = 25, M = 21.84 years, SD = 6.76), early-evidence (n = 25, M = 20.84 years, SD = 2.76), or late-evidence (n = 25, M = 19.80 years, SD = 2.58) condition.Footnote 1

Materials and procedure

Subjects individually completed a 20-min hazard perception test. On-screen instructions informed subjects that they would view 14 video clips, plus a practice clip, of driving situations (Fig. 1). The video clips were 15–95 s in length (M = 46.27 s, SD = 20.40), were viewed sequentially, and were labeled clip 1, clip 2, and so on. Subjects could score points by clicking the mouse when they saw hazards, defined as “something which causes the driver to slow down, stop, or change direction.” When subjects clicked, a flag appeared at the bottom of the screen. Subjects were told to click only when the traffic light in the top right corner was green. Clicking when the light was red would be classed as cheating, which subjects were told would be taken seriously and might result in disqualification from the experiment. Subjects were told that they would be notified during the test when a “bonus” clip was about to appear and if they identified all the hazards in that clip and scored the highest out of all subjects, they would win £50. This monetary incentive encouraged subjects to pay attention to the crucial bonus clip, which was always clip 3.

Fig. 1
figure 1

Screenshot of hazard perception driving test

Immediately after the bonus clip, all subjects were falsely told on-screen that they had been disqualified because they had clicked during a red light and, thus, would be disqualified from the experiment. Control subjects were not shown any false evidence. Early-evidence subjects were shown false evidence—a supposed “replay” of their clicking. The clip was replayed with red flags representing the subjects’ clicks. For a minimum of one click and a maximum of five clicks, the software overlaid a red light where there had been a green light, making it appear as if the subjects had clicked during a red light (Fig. 2). Early-evidence subjects then continued with the driving test. Late-evidence subjects continued for a further 9 min and were shown the false evidence at the end of the test.

Fig. 2
figure 2

Process of doctoring the replay

Finally, subjects were interviewed—supposedly to provide feedback about the test—by the first author, who was blind to subjects’ conditions. The experimenter read 10 statements aloud, and subjects indicated their agreement using a 5-point scale from 1 (strongly disagree) to 5 (strongly agree). Two statements asked subjects how useful they found the test (“Hazard-perception tests like this are helpful to learner drivers”), and eight statements asked about their performance on the test (“I performed consistently across clips”). The critical statement “I believe I cheated on the bonus clip” served as a self-report measure of false belief. Subjects also described or explained what had happened as a measure of confabulation. Subjects were probed for details about what had happened in the clip if they could not initially remember which clip they were disqualified for. The statement was rephrased to “I believe I clicked on a red light in the bonus clip/clip 3” when necessary. Subjects were debriefed and asked to provide retrospective consent.

Results

First, we examined subjects’ self-reported belief ratings (Fig. 3). A Kruskal–Wallis test showed a significant effect of condition on belief ratings, χ 2(2, N = 75) = 12.29, p = .002, η 2 = .17. Follow-up Mann–Whitney U tests revealed that late-evidence subjects reported higher belief scores than did early-evidence subjects (average ranklate = 31.12, average rankearly = 19.88; z = −2.86, p = .004). Late-evidence subjects also reported higher belief scores than did control subjects (average ranklate = 31.74, average rankcontrol = 19.26; z = −3.15, p = .002). Belief ratings did not differ significantly between early-evidence and control subjects (p = .409).

Fig. 3
figure 3

Mean self-reported belief ratings as a function of condition in Experiment 1 (left) and Experiment 2 (right)

Two judges, blind as to subjects' conditions, used Redlich and Goodman's (2003) criteria to classify subjects as reporting no, partial, or full confabulation (Fig. 4). Subjects partially confabulated if they speculated about what happened (“I was looking at the screen. Maybe I didn’t notice it”). Subjects fully confabulated if they described how the cheating occurred (“I wasn’t looking at the lights, I just didn’t realize”). Judges agreed on 81.3% (k¼ = .69) of categorizations, and we accepted the more conservative category in disputed cases.

Fig. 4
figure 4

Percentage of subjects coded as providing no confabulation, partial confabulation, or full confabulation of the cheating as a function of condition in Experiment 1 (left) and Experiment 2 (right)

For full confabulations, there were no significant differences between groups (all ps ≥ .208). However, when collapsed across partial and full confabulation categories, significantly more late-evidence subjects confabulated than control subjects, χ 2(1, N = 50) = 5.20, p = .023, Cramer’s V = .32. Early-evidence subjects did not confabulate significantly differently to late-evidence or control subjects (both ps ≥ .157). Most subjects who confabulated said that they did not notice the red light because they were focusing on recognizing hazards on the main screen (e.g., “I wasn't looking at the lights, I just didn't realize”).

In summary, our results show that delaying false evidence by a mere 9 min enhances its effect. In line with Mazzoni and Kirsch’s (2002) metacognitive model, subjects’ memories of the bonus clip may have faded enough in this period to prompt them to turn to external evidence—the fabricated video—for verification. Subjects then used the video to construct details about how it happened.

A possible counterexplanation for this pattern of results is that the delay between the false video and the interview, rather than the delay between the target event and the false video, increased false beliefs and confabulation. This possibility was tested in Experiment 2. However, the main goal of Experiment 2 was to investigate the effect of repeating false evidence and the influence of time between presentations.

Experiment 2

In Experiment 2, we exposed people to false evidence twice, either with or without a delay in between. Repeating false evidence should make it more compelling: We tend to judge information as true when it is repeated—the illusion of truth effect (e.g., Hasher, Goldstein, & Toppino, 1977)—and this holds even if we are initially told that the information is false (Begg, Anas, & Farinacci, 1992). Repetition likely promotes a feeling of fluency or being “easy on the mind” to perceive and recall (Jacoby & Kelley, 1987). Fluency is sometimes confused with familiarity, which we interpret as an indication of “truth” (Alter & Oppenheimer, 2009). Repetition may also make information appear more credible, and credible images are more easily confused with similar real memories (Nash, Wade, & Brewer, 2009). These mechanisms may be linked, with familiarity acting as an indicator of credibility (Foster, Huthwaite, Yesberg, Garry, & Loftus, 2012).

We also know from Experiment 1 that we are more likely to turn to false evidence after a delay. What we do not know, however, is how timing and repetition work in combination. Some studies have found an effect of repetition in a single session (Foster et al., 2012; Zaragoza & Mitchell, 1996); others have not (Warren & Lane, 1995). A delay between repetitions, or between repetitions and testing, may be necessary to observe an effect. Indeed, repeating misleading information after 1 month leads to memory errors (Gobbo, 2000), and memory errors are observed in single sessions when there is a delay between repetitions and testing (Weaver, Garcia, Schwarz, & Miller, 2007).

In Experiment 2 we investigated the role of timing and repetition. We presented subjects with false evidence once immediately after the event (control), twice immediately after the event (immediate-repeated), or twice over time (delay-repeated). We also included an immediate-repeated-test group, who viewed the evidence twice immediately, like the immediate-repeated group, but finished the experiment straight after viewing the false evidence. Comparing the immediate-repeated-test group with the immediate- and delay-repeated groups enabled us to differentiate the influence of timing versus repetition.

Method

Subjects

We randomly allocated 120 adults, 17–44 years old, to the control (n = 30, M = 21.70 years, SD = 4.98), immediate-repeated (n = 30, M = 19.97 years, SD = 1.33), delay-repeated (n = 30, M = 20.47 years, SD = 2.26) or immediate-repeated-test (n = 30, M = 21.33 years, SD = 3.06) conditions.Footnote 2

Procedure

The procedure followed that of Experiment 1, with three minor amendments. First, subjects viewed a doctored photograph depicting a click during a red light rather than a video. Second, the interview included six questions rather than ten. Finally, we rephrased the critical statement to measure both compliance and false belief. Specifically, subjects responded yes/no to the statement “I cheated on the bonus clip” (compliance) and rated their belief in this item on a 7-point scale from 1 = strongly disbelieve to 7 = strongly believe (belief).

Control subjects viewed the evidence once, immediately after the accusation. Immediate-repeated subjects viewed the evidence twice sequentially and then continued with the test for approximately 9 min. Delay-repeated subjects viewed the evidence once initially and then again at the end of the test. Immediate-repeated-test subjects viewed the evidence twice sequentially but were then told that because they had cheated, they could no longer continue and were tested immediately.

Results

Preliminary analyses revealed that immediate-repeated and immediate-repeated-test subjects did not differ significantly on compliance, χ 2(1, N = 60) = 0.11, p = .739, Cramer’s V = .04, belief, z = −.56, p = .576, or confabulation, χ 2(1, N = 60) = 0.00, p = 1.000, Cramer’s V = .00, measures. This confirms that in Experiment 1, it was the delay between the event and evidence that promoted false beliefs and confabulation, rather than the delay between the evidence and testing. In the subsequent analyses, we collapse across these two conditions and refer to them as the immediate-repeated group.

For the yes/no compliance measure, both immediate-repeated, χ 2(1, N = 90) = 4.94, p = .026, Cramer’s V = .23, and delay-repeated, χ 2(1, N = 60) = 11.88, p = .001, Cramer’s V = .45, subjects were more likely to say yes than controls. There was a nonsignificant tendency for delay-repeated subjects to say yes more than immediate-repeated subjects (Fisher’s exact p = .055, Cramer’s V = .21).

A Kruskal–Wallis test revealed that self-reported belief scores differed significantly across conditions, χ 2(2, N = 120) = 19.73, p = .000, η 2 = .17. Follow-up Mann–Whitney U tests showed that delay-repeated subjects reported higher belief than did immediate-repeated subjects (average rankdelay-repeated = 52.93 vs. average rankimmediate-repeated = 41.78; z = −2.08, p = .038). Delay-repeated subjects also reported higher belief than did control subjects (average rankdelay-repeated = 40.18 vs. average rankcontrol = 20.82; z = −4.46, p = .000). Immediate-repeated subjects reported higher belief than did control subjects (average rankimmediate-repeated = 50.93 vs. average rankcontrol = 34.63; z = −2.89, p = .004).

Two judges coded for confabulation and agreed on 84.2% (k¼ = .74) of cases (Fig. 4). Significantly more immediate-repeated, χ 2(1, N = 90) = 4.99, p = .026, Cramer’s V = .24, and delay-repeated, χ 2(1, N = 60) = 13.61, p = .000, Cramer’s V = .48, subjects than control subjects confabulated fully. Significantly more delay-repeated subjects than repeated evidence subjects, χ 2(1, N = 90) = 4.36, p = .037, Cramer’s V = .22, also confabulated fully. When collapsed across partial and full confabulation categories, significantly more delay-repeated subjects confabulated than either control subjects, χ 2(1, N = 60) = 11.43, p = .001, Cramer’s V = .44, or immediate-repeated subjects, χ 2(1, N = 90) = 4.47, p = .034, Cramer’s V = .22. There was a nonsignificant trend for immediate-repeated subjects to confabulate more than control subjects, χ 2(1, N = 90) = 3.45, p = .063, Cramer’s V = .20. Thus, subjects were more likely to confabulate details when the evidence was repeated, especially with a delay.

In summary, presenting false evidence more than once enhances its effect, perhaps because the evidence becomes more familiar and, thus, credible (Foster et al., 2012). The results from Experiment 2 add to those from Experiment 1 by showing that people are more likely to turn to external evidence after a delay and, thus, repetition over time is more persuasive than immediate repetition. Indeed, when evidence was repeated with a delay, subjects were 20% more likely to confabulate details about how they cheated than when evidence was repeated without a delay.

General discussion

Together, our results show that delaying the presentation of false evidence by only 9 min enhances its effect. Moreover, timing and repetition have an additive effect, such that repeating false evidence over time is more powerful than immediate repetition. Our study helps to refine Mazzoni and Kirsch’s (2002) metacognitive model by showing that we are more likely to turn to environmental evidence after a delay when our memory of the bonus clip has faded. We then use the evidence to construct details about how it happened. Experiment 2 suggests that repeating false evidence likely makes it appear more familiar and credible and, thus, we are more likely to believe it. This fits with a fluency account of remembering (e.g., Jacoby & Kelley, 1987).

Our results are based on an accusation of cheating that meets ethical requirements but differs dramatically and in numerous ways from complex acts involving intentionality (i.e., committing crimes). As such, the data cannot be used to directly predict the probability of false confessions, but it is reasonable to assume that the mechanisms underlying our false evidence effects might contribute to different types of compliant behavior and illusory beliefs. Furthermore, our results demonstrate confabulation, which studies show can induce rich false memories over time (e.g., Chrobak & Zaragoza, 2008).

Why should we care whether a brief delay or repetition of fabricated evidence affects beliefs and confabulations? On a practical level, our findings raise questions about the disclosure of evidence in police interrogations. Prominent training manuals encourage police officers to present suspects with false evidence, to repeat statements affirming the suspect’s guilt, or to be persistent in their line of questioning (Inbau, Reid, Buckley, & Jayne, 2005; National Policing Improvement Agency, 2009). Yet our results suggest that when false evidence is used, repeatedly presenting it—especially over time—could promote illusory beliefs and false confessions. Our results may also have implications for everyday experiences. Research shows that thinking about “what might have been”—or counterfactual thinking—motivates behavior change and improves performance (Markman, McCullen, & Elizaga, 2008; Morris & Moore, 2000). Perhaps these behavioral changes would be greater if the counterfactual thinking were accompanied by doctored evidence and considered repeatedly over specific delays. Future research should examine this.

Finally, our findings inform metacognitive models of memory (e.g., Jacoby & Kelley, 1987; Mazzoni & Kirsch, 2002). In the absence of a clear memory, we may turn to evidence in our environment—such as written documents, personal photos, and so on—to determine what happened. Our results suggest that metamemory plays a significant role here: Even a very brief delay between the target event and exposure to external evidence, or a simple sense of déjà vu, could be enough to make people doubt their memories and buy into external evidence—even false evidence.