Observing and comprehending streams of events and sequences of activities plays a major role in everyday life (Zacks, Kumar, Abrams & Mehta, 2009; Zacks, Tversky & Iyer, 2001). However, an event often is witnessed not in a complete but in a piecemeal and discontinuous manner, with temporal gaps interrupting its continuous flow. For example, this is the case when one monitors an ongoing event or activity by paying attention to it only from time to time (e.g., supervising a playing child) or when switching one’s attention back and forth between simultaneous events (e.g., observing the activities of different people preparing for a party). Also in films, events often are shown in a piecemeal, elliptical manner, leaving out portions of the event stream for reasons of economy of presentation (Schwan & Garsoffky, 2004; Schwan, 2013).

As research shows, observers cognitively fill in event gaps by inference and extrapolation, falsely remembering having seen parts of an event sequence that they in fact had not actually seen. Observers may infer events or activities that have occurred in a gap, particularly if the omitted events are related causally to the presented event segments (Strickland & Keil, 2011) or if viewers are familiar with the sequence of events (Pittenger & Jenkins, 1979). Also, research on representational momentum has demonstrated that observers tend to extrapolate the course of an event or activity beyond the end of its presentation (Freyd & Finke, 1984; Hubbard, 2005). Inference and extrapolation can be seen as a fault of memory or, conversely, mechanisms to build up a coherent representation of what has happened when observation was discontinuous. The research presented concentrates on gaps that arise as viewers infer the time that has passed just between two presented actions. In general, this phenomenon of event completion indicates that in cases of piecemeal observations, viewers regard those portions of an event that they have witnessed as samples from an underlying, continuous stream of events. Therefore, an observed portion of an event that is flanked by temporal gaps may be seen as standing for a more or less extended segment of the underlying event stream in a metonymic pars pro toto manner. This corresponds to the rhetorical figure where the observed portion of an issue together with the not observed portion of the issue forms an entity, and the presented part is a placeholder for the whole entity (as for example “England” for “Great Britain”). So, the observed portion of an event is a section of the longer lasting event and can serve as a placeholder for a larger segment in time.

Additionally, research on event segmentation has demonstrated that observers are flexible regarding the size of event segments. Depending on various factors, the very same stream of events may be segmented differently, ranging from a large set of fine-grained event units to a small set of coarse-grained ones (Zacks et al., 2001, 2009). This flexibility stems from the partonomic nature of events that allows for aggregating lower level basic actions into larger middle level units, which in turn can be aggregated further to higher level units (Vallacher & Wegner, 1989; Zacks et al., 2001). Hence, depending on segmentation, the very same activity may be conceived as a whole event unit in itself or as part of an event unit encompassing a (smaller or larger) number of different activities. This may not only hold for the perception of continuous streams of events, but similar mechanisms also may apply to discontinuous sequences of event parts interrupted by temporal gaps. That is to say, the very same part of an event should be perceived differently if it is flanked by gaps indicating the omission of either short or long event parts, because observers conceive it as standing pars pro toto for event segments of differing size.

Such differences in perceived event structure should manifest in event duration estimates (Boltz, 2005; Liverence & Scholl, 2012). More specifically, estimates of how long it takes in real life to perform a witnessed activity should increase with the increasing size of the flanking temporal gaps, indicating that the activity is interpreted as part of a more or less extended segment of the underlying stream of events. To test this hypothesis, in Experiment 1 viewers estimated how long an activity depicted in a film shot would take in real life. We hypothesized that based on their prior knowledge of events and activities (Burt, 1993; Burt & Kemp, 1991), viewers will estimate the real-life duration of an activity preceded by a large temporal gap as being longer than the real-life duration of the same activity following a brief temporal gap.

Furthermore, estimating how long a witnessed activity was observed also should be influenced by how long the observed activity would take in real life. Physical or social judgments often are influenced by anchors with extreme distance on the important dimension of the stimulus that is to be judged in the sense of a contrast effect (Sherif, Taub, & Hovland, 1958; Smither, Reilly, & Buda, 1988). Concerning piecemeal event depictions, we hypothesize that the presentation duration of an activity should be underestimated because in real life it takes longer, leading to a contrast effect. Furthermore, this effect should be all the more pronounced, the longer the duration of the event is that the presentation stands for. Therefore, the very same activity presentation should be estimated as being shorter if it is preceded by a larger temporal gap, indicating a more extended event segment. Furthermore, general models of prospective time perception (Zakay & Block, 1997) predict that time spans are estimated to be shorter if a simultaneous task is cognitively more demanding, because less cognitive resources are available for counting time pulses, that is, the perception of duration. Concerning event perception, it can be assumed that the processing of larger temporal gaps is cognitively more resource consuming than the processing of shorter temporal gaps. To test this hypothesis, in Experiment 2 viewers estimated how long a particular activity was presented to them.

In comparison, the two experiments focus on two different aspects of duration perception during event cognition: first, gaps in the time structure of continuous events should be filled in by viewers, because viewers abstract the presented event from the concrete temporal characteristics of the presentation and try to build a coherent representation of the underlying stream of events even if the event is presented in a fragmented way (Burt, 1993; Zacks et al., 2001). This was tested in Experiment 1. Second, how an event is understood should influence the perception of the concrete temporal characteristics of the presentation, because it may serve as the background against which formal features are interpreted (for example in the sense of an anchor effect, Sherif et al., 1958; Smither et al., 1988). This was tested in Experiment 2.

In sum, both experiments presented short videos of activity sequences that contained either short or long temporal gaps. In Experiment 1, viewers estimated how long the activity depicted in the last film shot takes in reality, whereas in Experiment 2 viewers estimated the duration of the film shot depicting the activity. We hypothesized that activities preceded by large temporal gaps are estimated to take longer in real-life but are estimated to have been presented for a shorter time span than the very same activities preceded by short temporal gaps.

Experiment 1: Estimating real-life duration of presented activities

Method

Participants

Voluntary participants (N = 83, 58 females, M age = 25.88 years, SD age = 8.80) signed an informed consent. An analysis with G*Power (Faul, Erdfelder, Lang, & Buchner, 2007) revealed that a total sample size of N = 77 is required for an effect size d = 0.38 (found in a pre-study using one half of the experimental film clips and testing 38 participants) and an error probability α = 0.05 / beta = 0.05.

Stimulus material

Amateur actors were filmed while performing 24 everyday activities consisting of a sequence of different actions, for example, having breakfast or cleaning a bicycle. For each activity, two film clip versions were produced: a “long” and a “short temporal gap” version. Each of the 48 film clips consisted of four shots separated by cuts, each shot lasting 3 seconds. The two film clip versions of the same activity showed the same actor or actress, both presenting the very same last shot (referred to as “target action” in the following) but differed with regard to the preceding three shots (Table 1; Fig. 1). In the “long temporal gap” version, the action shown in the target shot was preceded by three shots showing actions farther apart from each other in the underlying stream of events of the activity, implying broader temporal gaps between the shots. For example, the film clip “in the morning” presents in the long gap version the actor taking clothes out of the wardrobe, washing his face, putting a book in a sling bag, and then as target shot, buttering a slice of bread. In the “short temporal gap” version, the action shown in the target shot was preceded by three shots showing actions closer to each other in the underlying stream of events, implying smaller temporal gaps between the shots. For example, buttering a slice of bread was preceded by pouring out juice into a glass, toasting bread, and pouring milk into the coffee. Keep in mind that the shots were separated by straight cuts and therefore followed each other immediately without any visual gaps (i.e., no blank screens). (All clips are online https://osf.io/8tn4q.)

Table 1 Experimental film clips
Fig. 1
figure 1

Schematic structure of the two film versions edited for each event and used as stimulus material in both studies. In Experiment 1, participants estimated the duration of the activity presented in the last film shot (target action, dotted) as if this activity were to be executed in everyday life. In Experiment 2, participants estimated the presentation duration of the last film shot itself

Procedure and design

We conducted group sessions with up to four participants using four Mac Minis, controlled by IWM-Study, a program developed in our institute´s Media Technology and Development Department. Each participant watched all 48 film clips grouped into two blocks. Each block presented all 24 activities: 12 in the long and 12 in the short temporal gap version in randomized order. The order of the long versus the short gap version across blocks was counterbalanced among participants. If a participant saw an activity in the long gap version in the first block, he or she saw the same activity in the short gap version in the second block and vice versa. To minimize memory effects, blocks were presented in separate sessions (delay of 4-11 days).

Immediately after each film clip, participants wrote in a box on the screen how long in their opinion it would take to perform the action presented in the last shot of the film clip in everyday life. Participants were free to provide their estimations in seconds, minutes, or hours. To prevent participants from focusing only on the last shot and to have them instead follow the whole film clip, they were instructed to pay attention to the whole film clips, because at the end of each session a memory task would follow. Therefore, after seeing the 24 film clips of one session, participants were asked to write down a brief description of the content of each film clip (these data were not analyzed).

Results

All data (including R-scripts) have been made publicly available via the Open Science Framework and can be accessed at https://osf.io/ 8tn4q/. Data of one participant had to be eliminated completely, because she did not follow the instructions.

All duration estimations were transformed into seconds. A mixed linear model with the fixed factor “length of ellipsis” and the two random factors “participants” (intercept and slope) and “stimuli” (intercept and slope) was fitted to the data. We submitted the resulting model to the Anova() function of the car package (Fox & Weisberg, 2010). Participants judged the duration of the last action in real-life to be longer after long ellipses (on average 620.00 sec, SEM = 180.07) than after short ellipses (on average 413.44 sec, SEM = 161.38), χ 2(1) = 7.47, p = 0.006. Additionally, for descriptive reasons we calculated for each participant and each target action the ratio (duration estimation long gap version – duration estimation short gap version) / (duration estimation long gap version + duration estimation short gap version). For 23 of 24 target actions, the mean ratio was positive (Fig. 2).

Fig. 2
figure 2

In Experiment 1, for each participant and each target action, the ratio (duration estimation long gap version – duration estimation short gap version) / (duration estimation long gap version + duration estimation short gap version) was calculated. For 23 of 24 target actions, the mean ratio was positive

Discussion

These results indicate that if an observed action is preceded by a temporal gap, viewers interpret it in a pars pro toto manner as a placeholder for an activity segment that extends beyond the observed part of the event. As hypothesized, participants rated the time span it takes to perform the target action in real-life as significantly longer if it was preceded by a large gap in the underlying activity stream than if it was preceded by a short gap.

Experiment 2: Estimating presentation time of activities

Results of Experiment 1 indicate that viewers bridge gaps in the depiction of events and that they process shots preceded by a jump in time as standing pars pro toto for larger segments in the stream of events. But what does this mean for the perceived duration of the depiction itself? To examine this question, in Experiment 2 participants were asked to estimate the presentation duration of the last shot of the film, depicting the target action.

Method

This experiment was pre-registered (https://osf.io/4ecem/register/5771ca429ad5a1020de2872e).

Participants

Voluntary participants (N = 91, 66 females; M age = 23.3 years, SD = 5.15; for two participants no data concerning age and sex were available because the computer stopped during the experimental procedure) signed an informed consent. An analysis with G*Power (Faul et al., 2007) revealed that for an effect size d = 0.39 (found in another pre-study using one half of the experimental film clips and testing 44 participants) and an error probability α = 0.05 / beta = 0.05 a total sample size of N = 73 is required.

Stimulus material, procedure and design

Experiment 2 differed from Experiment 1 in two ways: First, for each film clip, participants were instructed to estimate the duration of the last shot presented in the film clip (target shot) by pressing the space bar for this length of time. Second, every participant saw all 48 film clips in randomized order in one single session.

Results

Data of two participants were incomplete because the computers stopped during the experimental session and therefore data of these the two participants were eliminated completely. Data of another 16 participants had to be eliminated completely, because they had already participated in another study using the same film material. Thus, datasets of 73 participants were analyzed further.

All duration estimations were measured in seconds. Based on the boxplot criterion for outlier detection (Tukey, 1977), we excluded all trials with a duration longer than 4,616 msec from the analysis (175/3,504 trials, 4.99%). A mixed linear model with the fixed factor “length of ellipsis” and the two random factors “participants” (intercept and slope) and “stimuli” (intercept and slope) was conducted. We submitted this model to the Anova() function of the car package (Fox & Weisberg, 2010). As predicted, duration estimation was longer in the condition with the short ellipses (M = 1.96 sec, SEM = 0.10) compared with the condition with the long ellipses (M = 2.05 sec, SEM = 0.10), χ 2(1) = 10.72, p = 0.001.

Additionally, for descriptive reasons we calculated for each participant and each target action the difference (duration estimation short gap version – duration estimation long gap version). For 21 of 24 target actions, the mean ratio was positive (Fig. 3).

Fig. 3
figure 3

In Experiment 2, for each participant and each target action, the difference (duration estimation short gap version – duration estimation long gap version) was calculated. For 21 of 24 target actions, the mean difference was positive

Discussion

Using the same experimental material that clearly showed a significant effect of temporal gap length on narrated time in Experiment 1, Experiment 2 found a significant influence of temporal gap length on narrative time. Viewers judged the length of film scenes to be longer if they were preceded by short temporal gaps than if they were preceded by long temporal gaps.

General discussion

The results of the present study indicate that if an observed action is preceded by a temporal gap, viewers interpret it in a pars pro toto manner as a placeholder for an activity segment that extends beyond the observed part of the event. As hypothesized, for the very same action segment, participants rated the time span that it takes to perform the target action in real-life as significantly longer if it was preceded by a large gap in the underlying activity stream than if it was preceded by a short gap (Experiment 1). Additionally, the length of a time gap had also an effect on the perceived presentation duration of the action segment. Observed action segments appear to be presented longer if they were preceded by short gaps in time than if they were preceded by long gaps in time (Experiment 2). Research on event cognition normally does not differentiate between different levels of time perception during event cognition (Schwan, 2013). The two studies described here tried to disentangle the viewers’ perception of the event duration and their perception of the presentation duration. Both are time characteristics of the stream of events, but according to the present results, they go in opposite directions.

Until now, research on event segmentation has shown that the occurrence of an action discontinuity strongly increases the probability of a segmenting decision (Magliano & Zacks, 2011), suggesting that the two actions on both sides of the discontinuity are perceived as belonging to different activity segments. The question arises how these segments are mentally represented: Do they coincide with the parts of the event that were actually observed or are they cognitively extended by inference, also including parts of the underlying stream of activities that were not directly observed? Results of Experiment 1 suggest that the latter holds true, indicating that in cases of piecemeal activity presentation, the observed actions are conceived as pars pro toto placeholders for larger parts of the underlying activity stream. Accordingly, viewers estimated the real-life duration of an activity preceded by a large temporal gap (and thus standing for a larger part of the underlying activity stream) as being longer than the real-life duration of the same activity following a brief temporal gap. According to Brewer and Lichtenstein (1980), discourse structure, that is, the structure of the presentation of a stream of events, in most cases differs from the underlying event structure, and it seems that participants in Experiment 1 cognitively filled in the gaps in the discourse structure to reconstruct a more complete and continuous event structure.

The question of what happens during the processing of sequences presented fragmentarily becomes especially relevant in media perception. For example, most often the duration of a movie is much shorter than the duration of the depicted event, would this event happen in reality (Bordwell, 1985). The present findings indicate that observers do not process fragmentary samples of parts of an activity as they are, but instead tend to interpret them as placeholders for a more extended underlying course of events, thereby also being influenced by what they did not see.

In addition, temporal gaps also seem to influence the processing of the presentation time itself. The length of temporal gaps influences not only what a presented action stands for (Experiment 1) but also how long the presentation time itself appears (Experiment 2). Time estimations measured in Experiment 2 did follow a contrast effect, which was found for physical or social judgments in literature (Sherif et al., 1958; Smither et al., 1988). It seems that the same factual shot duration appears to be shorter if it stands pars-pro-toto for longer activity parts than if it stands for shorter parts. Participants seem to perceive time spans of film shots against the background of the duration of the activities that these shots are conceived to stand for in real life, and therefore, film shots appear to be shorter if they represent more extended parts of an underlying stream of activities.

These findings are in accordance with the attentional gate model of prospective time perception (Zakay & Block, 1997). According to this model, viewers concentrating on another task have fewer attentional resources to monitor time pulses and therefore perceive time durations in reality to be shorter than they really are. It can be assumed that bridging abrupt changes in narrations needs all the more cognitive resources the longer the temporal gap is that has to be bridged. This is reflected in the findings of Experiment 2 when participants estimated the duration of a scene presentation to be shorter after seeing film clips with long gaps in time because the processing of long gaps is surely more resource demanding than processing of shorter gaps, and therefore, during the processing of long gaps, there are less resources available for the processing of time flow than during the processing of short time gaps.

Taken together, the present findings indicate that longer gaps in time during the perception of an event do indeed lead to elongating the amount of time a subsequent action stands for, and further, the length of these gaps does influence the perceived duration of a presentation. In sum, the duration of time sequences that we do not perceive when observing a piecemeal presentation of an event does influence what we process concerning the event that we observe as well as our subjective impression of time duration during the perception of the event.