Causality and continuity close the gaps in event representations

Kominsky, Jonathan F.; Baker, Lewis; Keil, Frank C.; Strickland, Brent

doi:10.3758/s13421-020-01102-9

Causality and continuity close the gaps in event representations

Published: 06 October 2020

Volume 49, pages 518–531, (2021)
Cite this article

Download PDF

Memory & Cognition Aims and scope Submit manuscript

Causality and continuity close the gaps in event representations

Download PDF

Jonathan F. Kominsky¹^na1,
Lewis Baker²^na1,
Frank C. Keil³ &
…
Brent Strickland^4,5

2142 Accesses
4 Citations
13 Altmetric
Explore all metrics

Abstract

Imagine you see a video of someone pulling back their leg to kick a soccer ball, and then a soccer ball soaring toward a goal. You would likely infer that these scenes are two parts of the same event, and this inference would likely cause you to remember having seen the moment the person kicked the soccer ball, even if that information was never actually presented (Strickland & Keil, 2011, Cognition, 121[3], 409–415). What cues trigger people to "fill in" causal events from incomplete information? Is it due to the experience they have had with soccer balls being kicked toward goals? Is it the visual similarity of the object in both halves of the video? Or is it the mere spatiotemporal continuity of the event? In three experiments, we tested these different potential mechanisms underlying the "filling-in" effect. Experiment 1 showed that filling in occurs equally in familiar and unfamiliar contexts, indicating that familiarity with specific event schemas is unnecessary to trigger false memory. Experiment 2 showed that the visible continuation of a launched object’s trajectory is all that is required to trigger filling in, regardless of other occurrences in the second half of the scene. Finally, Experiment 3 found that, using naturalistic videos, this filling-in effect is more heavily affected if the object’s trajectory is discontinuous in space/time compared with if the object undergoes a noticeable transformation. Together, these findings indicate that the spontaneous formation of causal event representations is driven by object representation systems that prioritize spatiotemporal information over other object features.

Filling the gap despite full attention: the role of fast backward inferences for event completion

Article Open access 28 January 2019

Seeing the unseen? Illusory causal filling in FIFA referees, players, and novices

Article Open access 22 September 2016

Object correspondence: Using perceived causality to infer how the visual system knows what went where

Article 18 June 2019

Processing scenes and events in real time requires a balance between accuracy and speed. We extract useful bits of information quickly and reliably, but limited processing resources make it impossible to attend to all available information in a timely fashion. A primary strategy for dealing with this problem is to employ specialized perceptual and inferential mechanisms that serve to prioritize certain types of information to the detriment of others.

In change blindness paradigms, for example, large changes to background objects can go entirely unnoticed (Jensen, Yao, Street, & Simons, 2011), while changes to prioritized categories of objects are quite likely to be detected. For instance, changes to animate entities are easier to detect than changes to obviously inert objects (New & Scholl, 2009).

Just as object representations boost attention and memory in useful ways, there is also evidence that event representations serve the same function. Memory for token instances of events is heightened at event boundaries (i.e., the moment at which one event transitions to another; Baker & Levin, 2015; Zacks & Swallow, 2007). In addition to event tokens, representations of event types, such as containment versus occlusion, modulate dynamic attention and memory toward object properties that help predict event specific outcomes in both infants and adults (Baillargeon & Wang, 2002; Strickland & Scholl, 2015).

What principles predict how information will be prioritized and stored during the perception of events? A broad way of characterizing the processes of visual cognition is as making intelligent (albeit likely unconscious) inferences regarding the nature of unfolding events (von Helmholtz, 1867). These inferences then have reflex-like consequences for attention and memory. Given the time constraints inherent to event processing and the often noisy and incomplete nature of incoming information, the mind must employ a set of heuristics that usually deliver accurate inferences but can go awry in carefully crafted laboratory settings. Here, we concentrate specifically on the perception of causal events as a way of exploring in detail such a set of heuristics.

Causal events are interesting from this perspective as recent work has suggested that there are highly specialized visual routines dedicated to the detection of causal events, leading to retinotopically specific visual adaptation to “causal launching” (Kominsky & Scholl, 2020; Rolfs, Dambacher, & Cavanagh, 2013). Moreover, simple causal events automatically guide visual attention towards category relevant information in both adults and infants (Kominsky et al., 2017). Thus, prior empirical evidence suggests that causal perception provides a potentially fruitful test case in which to study rapid heuristics in event processing.

In contrast to the aforementioned work, which concentrates on very simple Michottean “causal launching displays” (Michotte, 1946/1963), containing only simple geometric shapes moving in linear trajectories, the current project instead investigates the perception of more complex events, of the type we are likely to experience in our everyday lives. We employ a false-memory paradigm established by Strickland and Keil (2011) as an indirect way of asking how, in real time, heuristics are employed to create causal impressions that then trigger the creation of false memories.

Strickland and Keil (2011) showed observers simple video clips in which an agent launches an object (e.g., shoots a basketball). Crucially, these videos never actually showed the moment of contact or release in such launching events. After witnessing these “incomplete events,” the video either showed footage that implied that the launching had occurred (e.g., footage showing the resulting trajectory of a basketball toward a hoop) or did not imply causality (e.g., footage showing a person walking on a basketball court). Participants falsely reported seeing the moment of contact or release (e.g., the moment of release of the basketball) significantly more in the causal implication conditions. More recent work has gone on to demonstrate that these effects are impervious to many plausible “top down” influences, suggesting that the phenomenon is indeed driven by perceptual heuristics with specific triggers as opposed to rich background knowledge. Thus, explicit knowledge that false memory is being tested does not disrupt the effect (Papenmeier, Brockhoff, & Huff, 2019), nor does the filling-in effect vary as a function of expertise with the specific type of video being shown (Brockhoff, Huff, Mauer, & Papenmeier, 2016). For the current purposes, we use this paradigm to allow us to examine, via false-memory creation, the elements that trigger the creation of coherent causal event representations from incomplete information.

Across three experiments, we assessed three factors that could reasonably influence these dynamic impressions of causality: (1) Event familiarity—Perhaps we spontaneously "fill in the causal blanks" only when there is a familiar specific event schema available in memory (e.g., shooting a basketball toward a basketball hoop). (2) Object identity—Perhaps we spontaneously fill in the causal blanks only when a launched object is perceptibly identical prelaunch and postlaunch (e.g., seeing the trajectory of a football may not fill in the blanks for shooting a basketball). (3) Spatiotemporal continuity—Perhaps we spontaneously fill in the causal blanks if and only if the trajectory of the object appears compatible with basic physical principles such as the Spelke principle (Spelke, Breinlinger, Macomber, & Jacobson, 1992), that objects should follow continuous paths through space and time.

We discuss the logic and plausibility of each factor in the relevant introductions to the individual experiments that follow. To foreshadow, the event familiarity and object identity hypotheses (Hypotheses 1 and 2) are refuted by our data, but we find evidence supporting Hypothesis 3, the spatiotemporal continuity hypothesis.

Experiment 1

Experiment 1 concentrated primarily on the event familiarity hypothesis by investigating the role of familiar event schemas. Schemas, in this context, are semantic representations in long-term memory that are used to make predictions about the outcome of an event given inferences about goals and previously observed events of the same kind (Zacks, Speer, Swallow, Braver, & Reynolds, 2007). The stimuli used in Strickland and Keil (2011) fit into highly familiar schemas, such as “shooting a basketball toward a hoop” or “kicking a soccer ball toward a goal.” Participants’ false memory for the moment of release or contact in these events could be driven by their extensive and specific semantic knowledge about these events rather than a more general process of event representation, especially given that recent work has shown that such schemas can support filling in incomplete information from many different parts of an event (Kosie & Baldwin, 2019). With the stimuli used by Strickland and Keil (2011), one could even rely on the schemas that involve the mechanics of the human body. Past work has found that we demonstrate better memory for sequences that follow plausible bodily mechanics (Lasher, 1981), and that recognizable preparatory motions by agents draw attention and seem to support rich predictions (Cohn, Paczynski, & Kutas, 2017).

To test the event familiarity hypothesis, Experiment 1 attempted to replicate Strickland and Keil (2011) using entirely novel launching events, constructed in a three-dimensional animated environment, with unfamiliar beginnings as well as unfamiliar outcomes, involving no human actors. While these events were novel to participants, they still had immediately recognizable causal content: They involved either straightforward launching events or “launching-by-expulsion” (Michotte, 1946/1963). In other words, even though the setting and objects were unfamiliar, the predominant causal interaction was of an abstract type that even infants reliably perceive as causal by 6 months of age (Saxe & Carey, 2006).

One could possibly argue that these “novel” events are not truly novel, as by adulthood people have ample experience seeing such events as causal. Even in the absence of identifiable agents, it may be possible to recognize the overall structure of a “preparatory action” and a “coda,” and fill in the missing link from that (Cohn et al., 2017). However, the point of these videos was not to introduce a causal relationship that was so unfamiliar that it had to be learned. Rather, it was to create novel instances of the same (implied) abstract causal relationships that are automatically extracted from the world through perceptual mechanisms (Hubbard, 2013; Kominsky & Scholl, 2020; Michotte, 1946/1963; Rolfs et al., 2013), while ensuring that there is a lack of conceptual familiarity with the basic category of event on display.

Even more precisely, the question explored in Experiment 1 was whether that abstract causal content alone, with no specific familiar specific context or schema, would be sufficient to produce the filling-in effect. A causal implication condition was contrasted with a second condition in which there was no causal implication, as in Strickland and Keil (2011).

The predicted result, following from Strickland and Keil (2011), was that participants should be more likely to fill in a moment of release or contact that they never saw when the incomplete launch is followed by a causal implication. While we argue that this is due to causal implication enabling participants to construct a complete event representation, another possibility is that the videos with the lack of causal implication actually disrupt memory for the events immediately preceding their onset. That could be depressed only by the disruptive nature of the non sequitur videos.

To test this alternative explanation, we added a third condition in which there was no causal implication, but the moment of contact/release was actually presented. If participants do not report seeing the moment of release in this condition, it indicates that the lack of causal implication is disruptive, rather than the presence of causal implication being constructive. If, however, participants show accurate memory for the moment of release when it is actually present, then it supports our proposal that causal implication drives this filling-in effect.

Method

Experiment 1 was conducted at the Institut Jean Nicod and ruled exempt from review by the CERES IRB board in Paris, France.

Participants

We conducted pilot experiments for Experiments 1 and 2 (see the Supplemental Materials). Based on Strickland and Keil (2011), which used six videos per participant and roughly 15 participants per condition, these pilot experiments had four videos per participant but 30 participants per group (thus double the sample size). We then conducted a power analysis based on the weakest intergroup effect on target items in Pilot Experiment 1 (see Supplemental Materials), and determined that to reach 95% power to detect that effect we would need 68 participants per group. Due to imperfect randomization, we ended up slightly exceeding this target.

Participants (N = 206) were recruited via Amazon’s Mechanical Turk (for more information on this population, see Germine et al., 2012; Paolacci, Chandler, & Ipeirotis, 2010) for $0.75 compensation for an approximately 5-minute task. Participants were randomly assigned to one of three conditions, described below. An additional 11 participants were recruited but not included in the final sample due to violating preregistered exclusion criteria (see Results).

Materials and procedure

Four novel movies were animated using the 3D editing software, Blender (v2.66; The Blender Foundation, www.blender.org, 2015). Stimuli are available to view (https://osf.io/mjwkd/). The animations depicted novel (and thus unfamiliar) event types, in each case following the same progression of three shots (depicted in Fig. 1): The first shot showed all objects and items in the environment, the second initialized the movement of a ball leading to a launching action, and the third showed the consequences of the launching event on objects on the other side of the environment.

Participants watched four videos from one of three randomly assigned conditions (each participant only saw videos from one condition). The causal condition depicted an implied object release that cut to a causally consistent shot of the launched object continuing on an expected trajectory towards a target and having some impact on an object on the other side of the space. Importantly, the moment of release was never actually shown in this condition. The non sequitur condition did not show the launched object in the second shot of the video, but instead showed the occurrence of an unlikely event. For example, as illustrated in Fig. 1, one video implied a ball getting hit by a bar. In the causal condition, the following shot showed the ball hitting cylinders on a platform and knocking one of them over (an effect of the ball’s trajectory), whereas in the non sequitur condition, the ball was not present, and the video instead depicted cylinders moving up and down like pistons. Neither the causal nor the non sequiturconditions actually showed the launching action. As a control to verify that the non sequitur conclusion was not simply disrupting attention to the end of the first half of the video, a third group of participants saw a non sequitur complete condition, where participants actually saw the launching action (the “moment of release” or “moment of contact”) leading to the causally unpredictable conclusion.

After viewing each video, participants saw 10 images and were asked to indicate whether each image had appeared in the preceding video. There were three image types: images of the implied but unseen moment of release (target moment-of-release images; one per video), images taken from the proceeding video for which the correct answer was “yes” (seen-action images; six per video), and images taken from a video with highly salient changes for which the correct answer was “no” (lures; e.g., a picture of the scene in which a central object was a different color; three per video). All video and picture orders were randomized.

Results and discussion

A total of 11 participants were removed and replaced with new recruitment for reaching less than 50% accuracy across all nontarget recognition items (computed as the average of the average accuracy for seen-items and the average for lures, to compensate for the uneven number of items of each type): six from the causal/ incomplete condition, two from the non sequitur (NS)/incomplete condition, and three from the non sequitur/complete condition. In addition, due to imperfect randomization, participant assignment was slightly unbalanced, with two extra participants in the non sequitur/incomplete condition. This left 68 participants in the causal/incomplete condition, 70 in the non sequitur/incomplete condition, and 68 in the non sequitur/complete condition.

The key dependent variable (DV) was the proportion of “yes” responses to the test items averaged across all four events for each participant. Our preregistered analyses started with a 3 (condition: incomplete vs. NS incomplete vs. NS complete; between-subjects) × 3 (item type: target vs. seen-image vs. lure; within-subjects) mixed-model analysis of variance (ANOVA) using R’s afex package (Singmann et al. 2019). This analysis found main effects of condition F(2, 203) = 22.96, p < .001, and item F(2, 406) = 359.83, p < .001, as well as a significant interaction, F(2, 406) = 30.71, p < .001. We conducted separate preregistered one-way between-subjects ANOVAs examining the effect of condition for each item type.

We first examined the item type of primary interest, the target moment-of-release image. Note that “yes” (recognition) responses in the incomplete conditions were false memory of an implied image, whereas recognition in the non sequitur complete condition was an accurate memory of an action seen by the participants. The results can be found in Fig. 2.

A one-way ANOVA of the effect of condition on average moment-of-release recognition responses was significant, F (2, 203) = 46.80, p < .001, η_p² = .316. Post hoc Tukey HSDs confirmed the impression provided by Fig. 2: There were significant differences between the non sequitur/Incomplete condition (M = 38.2%, SD = 33.2) and the causal/incomplete condition (M = 71.3%, SD = 27.1), as well as between the non sequitur/Incomplete condition and the non sequitur/complete condition (M = 81.6%, SD = 21.0), ps < .001. However, there was no significant difference between the causal/incomplete condition and the non sequitur/complete condition, p = .078.^{Footnote 1}

In short, participants were highly likely to correctly recognize that they saw the moment of release in the non sequitur/complete condition and to falsely recognize an implied moment of release in the causal/incomplete condition, but much less likely to make the same error in the non sequitur/incomplete condition. (These findings were nearly identical to what we observed in the pilot experiment.)

The corresponding ANOVAs for the seen-image and lure item types found no significant effects of condition on the rate of “yes” responses, F(2, 203) = 1.78, p = .17, η_p² = .02 and F(2, 203) = 0.11, p = .9, η_p² = .001, respectively. The full pattern of responses can be found in Table 1. Experiment 1 extends the results of Strickland and Keil (2011) to completely novel events that are not supported by familiar schemas, or even somewhat abstracted schemas having to do with bodily motion (Lasher, 1981). Participants “filled in” the moment of release for events that they had never seen before, in completely unfamiliar contexts, provided there was spatiotemporal continuity and a causal consequence. Furthermore, we were able to rule out the deflationary explanation that the non sequitur event was simply distracting and thus disrupted memory around the moment of release: Participants had no difficulty recognizing that they had seen the moment of release when it was actually presented, even when followed by a non sequitur event. These findings support the hypothesis that false recognition of an implied action relies upon causal inferences (likely guided by spatiotemporal information), but not upon highly specific semantic event schemas.

Table 1 Average percentage “yes” responses to moment-of-release images, seen action images, and lure images in all conditions in Experiment 1

Full size table

Experiment 2

In the causal implication condition of Experiment 1, the perceived motion after (implied) contact/release was always of the causally relevant (i.e., launched) object interacting further with the scene (e.g., knocking over a cylinder). That is, in addition to the trajectory of the object, the relevant object was involved in a further causal interaction in the causal condition of Experiment 1. In the non sequitur condition, participants instead saw a causally irrelevant event that could not be caused by the launched object or the launching event (e.g., pistons pumping up and down). Thus, the absence of a causally relevant object was confounded with the presence of a causally irrelevant event. It is therefore impossible to determine whether causal impressions were created by seeing causally relevant object motion in the second half of the video (i.e., the launched object having some further causal interaction), or inhibited by the presence of an event that could not be caused by the launched object in any way.

Experiment 2 examined this issue explicitly by replicating and extending the findings from Experiment 1. In this experiment, we crossed the presence/absence of the object’s motion with the presence/absence of a secondary event that could have been a consequence of the object’s subsequent trajectory. By presenting the effect of the ball separate from its movement, we created a case which would allow us to assess more precisely the types of causal information required to trigger filling in: If any causal schema is enough, then the causal consequence should be the factor that determines the filling-in effect. However, under the object identity and spatiotemporal continuity hypotheses, the presence or absence of the object in the second half of the event should be the determining factor.