It is not unusual for people to mistakenly remember having done something—shut the window, put their keys in their bag—that they have not in fact done. People can be induced to remember falsely performing simple, everyday actions by imagining seeing themselves or by actually seeing someone else perform those actions (Goff & Roediger, 1998; Lindner, Echterhoff, Davidson, & Brand, 2010). Such work has not examined the impact of nonvisual processes, such as hearing the sounds of actions, on these false action-memories, however. The present study explored whether sound alone in the absence of direct visual cues could trigger false action-memories and how false memories induced by sound compare with those induced by vision.

In the typical paradigm to investigate false action-memories, participants perform various simple actions (e.g., break the toothpick; roll the dice) in Phase 1. In Phase 2, they are presented with some of the actions that they performed earlier (e.g., break the toothpick) and some new (nonperformed) actions (e.g., pour water in the glass). The kind of processing (e.g., imagination, observation) in this second phase has been varied between and within studies. Goff and Roediger (1998) showed that imagining performing simple actions in Phase 2 inflated the number of false claims to have performed the actions oneself on a later, surprise source-memory test, the imagination-inflation effect (see also Garry, Manning, Loftus, & Sherman, 1996). Thus, people might falsely claim that they actually poured water into the glass in Phase 1 when in fact they only imagined performing this action in Phase 2 (Lampinen, Odegard, & Bullington, 2003; Thomas, Bulevich, & Loftus, 2003).

Additional work has shown that watching other people perform actions via short video-clips in Phase 2 can have similar effects (Lindner et al., 2010; Nash, Wade, & Brewer, 2009). Thus, watching someone else pour water into a glass can lead people to claim mistakenly that they themselves performed this action, dubbed the observation-inflation effect (Lindner, Schain, Kopietz, & Echterhoff, 2012; Schain, Lindner, Beck, & Echterhoff, 2012). Exposure to photographs of actions involving objects in their completed states in Phase 2 (e.g., seeing a photo of an empty water bottle with a full glass of water beside it) also induces false claims of having performed those actions, the photo-inflation effect (Henkel, 2011). Taken together, imagining oneself, observing another person, as well as looking at photos of completed actions all lead to false memories of having performed the action.

Obviously, there are important differences between these three processes. For instance, observing and watching are relatively passive, whereas imagining is more active; observation inflation involves dynamic stimuli, whereas photo inflation involves static stimuli. However, there also is one noticeable constant: Creating mental images, watching video-clips, and viewing photos all involve the visual modality and result in visual memory traces, whether self-generated or actually perceived. Indeed, it has been shown that memory representations arising from imagination often are primarily visual in nature (Johnson, Foley, Suengas, & Raye, 1988). In line with this assumption, Marsh, Pezdek, and Tam (2014) recently showed that visual perspective during imagination alters imagination inflation for recent events. Moreover, source memory for self- vs. other-performance decreased when visual similarity between self- and other-performance increased (Hornstein & Mulligan, 2004). Therefore, a better term to characterize these three inflation phenomena considering the basic feature they have in common might be visual(ization) inflation.

This conceptualization of a visual(ization) inflation effect is further corroborated by research highlighting the role of vision in perceptual illusions of agency and in memory illusions. For instance, if a rubber arm is lying in front of a person while his or her own arm is hidden, then viewing this arm being brushed (and simultaneously being brushed on the hidden arm) evokes the illusion of feeling the touch of the brush that is applied to the rubber arm and in addition generates an illusion of ownership of this arm (Botvinick & Cohen, 1998). Moreover, showing people a doctored photo of a childhood event led many to believe that they actually experienced the event (Wade, Garry, Read, & Lindsay, 2002), and showing people photographs of an unfamiliar scene increased beliefs that they had been to this location (Brown & Marsh, 2008).

Taken together, the reported work suggests a critical role of visual cues in the creation of false perceptions and memories of agency. At the same time, this research has largely neglected the potential impact of other sensory cues, such as auditory ones. However, in one study, participants viewed a video-clip of an artificial dummy head whose ear was caressed by a brush (Kitagawa & Igarashi, 2005). The sound of that action was recorded by a microphone in the ear of the dummy’s head. When it was played via headphones, people gave high ratings when asked to agree to the item, “I felt tickling on my own ear.” Thus, confusions in the perception of agency can arise on the basis of auditory cues alone. Accordingly, the intriguing question that arises here is whether just listening to the sound of an action—without seeing the action—can induce false memories of having performed the action oneself. Put differently: Is there a sound-inflation effect?

To address this, we had participants perform a series of actions and then (a) listen to the sound of someone performing actions, (b) watch someone performing actions, or (c) both listen to the sound and watch someone performing actions. Two weeks later, they had to remember which actions they actually performed.

According to the source-monitoring framework, people's judgments about the source of a memory are influenced by its phenomenal features and how they compare to features typical for memories derived from certain sources (Henkel & Carbuto, 2008; Johnson, Hashtroudi, & Lindsay, 1993; Johnson & Raye, 1981). Research of false action-memories has been in line with such an approach, emphasizing the role of sensory-perceptual cues in creating this kind of memory illusion (Henkel, 2011; Lampinen et al., 2003; Thomas et al., 2003). Feature importation gives rise to false beliefs and memories when features arising from one experience (e.g., imagination or observation) are used when evaluating another experience (e.g., “Did I really do this?") (Henkel & Franklin, 1998; Lampinen, Meier, Arnal, & Leding, 2005; Lyle, Bloise, & Johnson, 2006).

The inflation phenomena known to date might specifically rely on a misattribution of vivid visual memory-traces generated through imagination or observation to self-performance. This approach can be easily applied to other sensory qualities: Just like visual memory-traces, acoustical memory-traces from listening to another’s actions could be misattributed to self-performance. Therefore, we expected to find a sound-inflation effect.

When one hears the sound of someone else performing an action, it not only creates acoustical memory traces, but past work on cross-modal imagery suggests it also might create visual memory traces (Spence & Deroy, 2013). That is, listening to the sound of someone pouring water might lead one to visualize the action as well. We included a self-report measure of concurrent imagery to investigate this possibility. Rather than potentially bias participants to consciously consider concurrent mental imagery while hearing or seeing the other person perform the actions, we had participants make a global assessment of concurrent imagery after they completed the task.

Method

Participants

Participants were 85 undergraduates from Fairfield University in CT (19 men, 66 women). Ages ranged from 18 to 23 years (M = 19.21, SD = 1.07). One participant failed to return for the second session, and one was excluded because English was not the native language, leaving a total of 83 subjects for analyses. Given an α-level of .05, this sample size allowed for a high power (1–β ≈ .93) to detect an interaction of medium size (f = 0.25) between the two independent variables (for the within-subjects’ factor, r was set to .30).

Design

The study used a 2 (encoding in Phase 1: performed, not performed) x 2 (exposure status in Phase 2: exposed, not exposed) x 3 (type of processing in Phase 2: heard only, watched only, heard and watched) mixed-factorial design, with encoding and exposure status manipulated within-subjects, type of processing manipulated between-subjects, and proportion of false action-memories as the dependent variable.

Materials and procedure

Actions

A total of 48 actions were generated: 32 critical actions with distinctive sounds (e.g., sharpen the pencil, crack open the peanut) and 16 filler actions. For a given subject, 16 of the critical actions were performed, and 16 were not performed (i.e., not presented) in Phase 1. Of the 16 actions of each type, 8 were again encountered in Phase 2, and 8 were not. A total of four sets were created to counterbalance these actions across conditions. For the 16 filler actions, 8 were performed in Phase 1, and 8 were not. Of the 8 actions of each type, 4 were presented again in Phase 2, and 4 were not. Filler actions were not counterbalanced, and therefore reported results only rely on the critical actions.

Phase 1: Perform actions

Participants were tested individually at a table with 24 to-be-performed objects on it. They were told the study examined people's perceptions and thoughts about everyday actions. They were instructed that they would see an action statement for 5 seconds (e.g., fold the paper bag) and were to locate and place the object(s) in front of them and then perform the action once. After completing the action, the experimenter moved the objects back to their place on their table, and the next action statement appeared. The 16 critical and 8 filler actions were performed in random order.

Phase 2: Exposure to other's actions

Participants were told that the next task involved making judgments about actions that someone else performed. Subjects were randomly assigned to one of three experimental conditions in which they (a) only listened to the sound, (b) only watched, or (c) simultaneously both listened to and watched the person perform the actions. Their task was to count the number of times each action was performed.

Color videos depicted a young woman in her early 20s wearing a black top at a table performing a series of actions with objects (Fig. 1). The shot was from a second-person perspective, as if she were sitting across the table from the participant, and it showed from her midsection to her neck, focusing on her torso, arms, and hands. The table held only the object(s) needed for a given trial.

Fig. 1
figure 1

Example of the videoclips that were used. Video representing the action “Pour the water in the glass.”

Videos were either played with auditory input only, with visual input only, or with both auditory and visual input. Each trial started with the name of an action statement shown for 3 seconds (this was mainly to reduce ambiguity about what people would then hear in the listen-only condition). This was followed by presentation of the action being performed two to five times (depending on the length of the corresponding action) for a total of 10 seconds, followed by a prompt for participants to indicate the number of times the action was performed during the trial. During the course of Phase 2, each of the 16 critical (8 formerly performed, 8 nonperformed) and 8 filler actions (4 formerly performed, 4 nonperformed) appeared on four separate trials, making a total of 96 observed trials in Phase 2. All 4 trials containing 24 (16 critical plus 8 filler) actions were presented in a different randomized order with the restriction that the last video within one trial and the first video within the next were never identical.

After all trials, participants rated how difficult they found the counting task to be (1 = not at all difficult; 7 = very difficult) and indicated the percentage of trials on which they created mental images of what the actions looked like or sounded like while doing the counting task.

Phase 3: Memory test

Two weeks later, participants were given a surprise memory test on which they read various action statements and responded to the 48 events by noting either “yes, I performed the action” or “no, I did not perform the action.” Instructions emphasized that the task explicitly asked about the actions they themselves actually did with the objects on the table.

The memory test was comprised of all 48 actions in random order and hence consisted of 16 critical actions performed in Phase 1 (8 of which they had been exposed, and 8 not exposed to in Phase 2), 16 critical actions not performed in Phase 1 (8 of which they had been exposed, and 8 not exposed to in Phase 2), and 16 filler actions (4 performed in Phase 1 and exposed to in Phase 2, 4 performed in Phase 1 and not exposed to in Phase 2, 4 not performed in Phase 1 and exposed to in Phase 2, 4 not performed in Phase 1 and not exposed to in Phase 2).

Results

Alpha was set to .05; p values are reported two-tailed. For descriptive statistics, see Table 1.

Table 1 Proportion of times participants claimed to have performed actions as a function of hearing, watching, or both hearing and watching other people perform those actions in Phase 2

First, we analyzed the proportion of false action-memories within each of the three groups separately. An inflation effect would be seen when there were significantly more false claims of performing actions that participants were exposed to during Phase 2 but were not performed in Phase 1 than actions that were brand new (i.e., neither performed in Phase 1 nor exposed to in Phase 2). Indeed, this pattern was found in the listen-only condition, t(27) = 7.22, p < .001, d = 1.36, 95% CI [0.84, 1.88], in the watch-only condition, t(26) = 7.25, p < .001, d = 1.39, 95% CI [0.85, 1.92], as well as in the listen-and-watch condition, t(27) = 5.79, p < .001, d = 1.09, 95% CI [0.62, 1.56]. That is, inflation effects were found for all three types of processing.

Examination of effect sizes indicated this increase was nearly equivalent in the listen-only (d = 1.36) and watch-only conditions (d = 1.39), whereas it was slightly lower in the listen-and-watch condition (d = 1.09). A 2 x 3 mixed ANOVA of status of exposure (exposed, not exposed) and type of processing (listen only, watch only, listen and watch) in Phase 2 on false memories of self-performance was run to check for significant differences in the size of the three inflation effects. Such differences should be reflected in an interaction indicating that exposure to other’s actions had a different impact within the three types of processing. However, no interaction occurred, F(2,80) = 1.41, p = .250. Also, this analysis revealed no main effect of type of processing; that is, groups did not generally differ in their tendency to falsely claim actions as self-performed, F(2,80) = 1.56, p = .217. Of course, this analysis yielded a main effect of exposure across conditions, as already found within conditions, F(1,80) = 137.66, p < .001, ηP 2 = .632.

Neither performance on the secondary counting task nor perceived difficulty of the task differed between the three groups, Fs(2,80) ≤ 1.58, ps ≥ 0.213. However, differences between groups emerged with regard to self-reported generation of mental imagery during exposure to another’s actions, M = 70.36 in the listen-only, M = 52.96 in the watch-only, and M = 65.71 in the listen-and-watch conditon; F(2,80) = 3.77, p = .027, ηP 2 = .086. Bonferroni-corrected, post-hoc comparisons revealed a significantly higher rate of claiming to have generated mental imagery for the listen-only condition than for the watch-only condition (p = .029), with no other significant differences (ps ≥ .165).

Discussion

Observing others or imagining oneself perform actions can make people believe that they have performed those actions themselves. We have proposed that this is consistent with a misattribution of visual cues and have termed these effects visual(ization) inflation. However, it was not yet known whether sound alone could make people claim they performed actions they did not perform—and indeed it did. Thus, we demonstrated a sound-inflation effect. Moreover, merely listening to the sound of an action led to a comparable increase in false memories as watching an action or simultaneously listening to and watching an action.

Our study was primarily designed to determine if there is a sound-inflation effect. This demonstration importantly extends our knowledge about potential sources of false action-memories. Even in the absence of any direct visual cue (e.g., when an event is out of sight), people will be prone to falsely remember that they actually performed an action themselves. Moreover, such false memories were as common as those triggered by vision (with or without additional sound).

According to the sensory-feature-importation account, imagination, observation, photo, and also sound inflation occur when vivid and easy-to-generate sensory representations are indistinguishable from sensory representations that would have arisen had one actually performed the actions. These mental representations undoubtedly can arise from multiple sources. Sound inflation may result from a misattribution of auditory representations created while listening to the action being performed. Our findings also suggest that concurrent imagery played a role. We suggest that visual information arising from spontaneous imagery generated while listening to someone else perform the actions also contributes to false action-memories based on our finding that while hearing the sound of an action, participants self-reported significantly higher rates of concurrent imagery than when they watched the action. Such cross-modal imagery (Spence & Deroy, 2013) is consistent with research in related areas showing that people were most likely to claim falsely to have seen an event when they had both visually imagined and actually heard the event (Henkel, Franklin, & Johnson, 2000). Due to the global nature of the rating task we used, however, we cannot state definitively that this is the case. Further research is needed with more fine-grained ratings about the modality of the concurrent imagery.

In addition, it is important to note that the mental representations arising from listening to other people perform events likely involve motor representations as well. Past work has shown that observing someone else performing an action leads to motor representations similar to self-performance in the observer which can be reactivated at retrieval (Grèzes & Decety, 2001; Senkfor et al. 2002; Wutte, Glasauer, Jahn, & Falangin, 2012). Moreover, sound alone is as capable as vision alone in inducing motor representations similar to actual performance (Alaerts, Swinnen, & Wenderoth, 2009; Caetano, Jousmäki, & Hari, 2007).

Yet, if the amount of false action-memories was simply a function of the quantity of the imported sensory and motor features, one might have expected fewer false memories in the unimodality conditions compared to the bimodality condition. For instance, imagination-inflation research has shown that the more sensory elaborated the mental images, the more false memories occurred (Thomas et al., 2003). Similarly, motor-simulation research has shown that bimodal perception (audition and vision) led to the highest amount of motor facilitation (indeed equalling the sum found after unimodal stimulation, Alaerts et al., 2009; Kaplan & Iacoboni, 2007). Why then is hearing-alone or vision-alone as likely to produce false memories as hearing and vision combined?

It may be the case that sensory and motor representations arising from various encounters—from observing someone, from imagining oneself, from seeing photos—may have some threshold to pass before they are mistakenly judged as having originated from one's own actions, but once that threshold is passed and the mental representation for the event has enough features to mistakenly attribute it to self-performance, additional cues may not always be needed in a cumulative way. Furthermore, no doubt there are many different combinations of features and mechanisms giving rise to those features that can push a mental representation past that threshold. For instance, visual information may be weighted more heavily in such judgments (Posner, Nissen, & Klein, 1976), and the results from the current study suggest that concurrent visual imagery may contribute. In other words, sound inflation might primarily be another instance of visual(ization) inflation.

Because we exposed all participants to the action statements before the actions, it also is possible that the mere exposure to the action statements rather than sensory or motor features is what produced the increase in false memories in each group. While familiarity likely contributes to the effect, prior research suggests that it alone is unlikely to fully account for such false memories. For example, in two experiments, Lindner et al. (2010) asked one group of participants to read action statements similar to the ones used here and another group to read the statements and then observe the corresponding videos in Phase 2. Reading alone did not significantly increase false memories, but reading-plus-observing the actions did. Other studies corroborate that simply reading the action statements typically does not produce the rate of false beliefs and memories found as when rich perceptual details are evoked (Henkel, 2011; Thomas et al., 2003). Similarly, prior research suggests that the inflation phenomena known to date are not simply due to a certain response bias: Source-monitoring instructions and warnings provided at test did not alter the amount of false memories in young participants (Thomas & Bulevich, 2006; Lindner et al., 2010).

In conclusion, this study provides clear evidence that people can be induced to falsely claim to have performed an action that they merely heard someone else perform. Further research is needed to disentangle the specific mechanisms behind this memory illusion and the relative contributions of misattributed auditory features and/or misattributed visual and motor features triggered by sound. Studies manipulating the generation of sensory and motor features when participants listen to another’s actions more directly are needed to draw firm conclusions about the mechanisms underlying sound inflation. Such work might include ratings of concurrent imagery evoked while seeing or hearing the sounds of events that specify the modality and that are made while engaged in each type of perceptual processing rather than after completion of the task. In addition, future work could use secondary tasks that make concurrent imagery more or less likely (e.g., engaging visual, auditory, or motor systems). Once we know how these kind of false memories emerge, we can find ways to reduce these potentially costly errors.