Time perception is crucial for the functioning of human beings. We use our perception of time not only to regulate our behaviour but also to unify and make sense of our conscious experiences (Montemayor, 2017). To investigate human time perception, objective measurements of time (i.e., clock time) are compared with participants’ subjective estimates. This is often accomplished by instructing participants to keep track of time while engaging in a concurrent task. Participants then estimate how long the task took to complete. When this method is employed, time estimates are called prospective (Hicks et al., 1976). These instructions result in the intentional encoding of temporal information (Block & Zakay, 1997). However, people do not necessarily attend to the passage of time while engaging in their daily activities. Therefore, to better understand human time perception, experiments must also include conditions in which participants are unaware of an impending time estimate (Tobin et al., 2010). When this method is employed, time estimates are called retrospective, and they provide insight into the incidental encoding of temporal information.

Direct comparisons between prospective and retrospective time estimates are quite informative because they help reveal the role of intentional and incidental encoding in human time perception. Yet few such direct comparisons exist (Tobin et al., 2010), and most narrowly focus on durations ranging from seconds up to 2 minutes (e.g., Avni-Babad & Ritov, 2003; Boltz, 2005; Kurtz & Strube, 2003; Predebon, 1996). In fact, an 8-minute condition used in Tobin and Grondin (2009) is considered a long duration. Experiments that directly compare prospective and retrospective paradigms at even longer duration are rare (see Bakan, 1955; Bisson & Grondin, 2020; Tobin et al., 2010; Tobin & Grondin, 2009). This is in part because retrospective time estimates entail an important methodological challenge. As soon as a time estimate is requested, participants realize that keeping track of time is an important experimental consideration. Consequently, the retrospective paradigm becomes prospective after a single estimate, generally restricting this paradigm to between-participants designs.

Most importantly, concurrent tasks in this area of research differ greatly from one experiment to the next (e.g., listening to a piece of music vs. performing a Stroop task; Brown & Stubbs, 1992; Zakay & Fallach, 1984). These differences may be especially problematic because the cognitive load of a task (i.e., the task’s difficulty) has been shown to interfere with people’s perception of time (Block et al., 2010). Moreover, the difference between “easy” and “hard” conditions is typically based on a subjective assessment (e.g., Tobin & Grondin, 2009). It is therefore unclear to what extent cognitive load impacts time perception in relation to task duration and the knowledge of an impending time judgment under controlled conditions.

The goal of the current experiment was to investigate people’s perception of “long” durations under prospective and retrospective estimation conditions, using a visual and memory search paradigm (Schneider & Shiffrin, 1977; Shiffrin & Schneider, 1977). This classic task allowed us to vary and independently quantify cognitive load. This experiment was guided by contemporary research comparing prospective and retrospective time estimation.

The two time estimation paradigms

A common finding is that prospective estimates tend to be greater than retrospective ones (Brown, 1985; Kikkawa, 1983; Kurtz & Strube, 2003; Zakay, 1992). A summary of differences between estimation paradigms is provided by Block and Zakay (1997), who presented mean estimation ratios for 20 experiments. These ratios were calculated by dividing the participants’ estimated time by objective time in each experiment. Ratios closer to one indicate higher estimate accuracy. The mean ratio for prospective estimates was found to be .89 compared with .77 for retrospective estimates. Therefore, while time was underestimated in both conditions, prospective estimates were found to be more accurate in this sample of experiments because they were greater than retrospective ones.

Most time estimates are obtained after participants engage in some sort of task, however. For example, past experiments examining prospective and retrospective timing ability have employed a range of activities, including a Stroop task (Predebon, 1999; Zakay & Fallach, 1984), memorizing and rehearsing lists of items (Miller et al., 1978), listening to a piece of music (Brown & Stubbs, 1988, 1992), and even passively watching a boiling pot of water (Block et al., 1980) or a flickering lightbulb (Zakay, 1992). The cognitive load of these tasks obviously varies greatly, therefore making it difficult to compare the results of time estimation among them.

In addition, prospective paradigms require participants to execute an assigned task while also keeping track of time (Zakay & Block, 2004). Arguably, participants make greater prospective estimates compared with retrospective ones (e.g., Hicks et al., 1976; Klapproth, 2007) because attention to time allows for more temporal information to be intentionally encoded throughout the interval. This effect is evidenced by studies that have adopted a dual-task paradigm in which a secondary task diverts attention away from the timing task (Brown & Stubbs, 1992; Predebon, 1996). The direction of attention away from timing results in participants encoding less information about time, which makes timing more variable and shortens the perceived duration (Brown et al., 2013; see also Brown, 1997).

Traditionally, the intentional encoding of time has been described using the attentional gate model (AGM) shown in Fig. 1(Zakay & Block, 1997). In this model, the mind functions like a clock. An internal pacemaker, which is influenced by endogenous bodily processes like arousal, emits a rhythmic stream of temporal information called “pulses.” Timing begins when a switch is activated by instructing the person to start keeping track of time. This switch allows the stream of pulses to reach a cognitive counter (i.e., an accumulator). However, this mechanism is mediated by attention (i.e., the attentional gate). The attentional gate allows more pulses to be accumulated, thus lengthening subjective duration. When attention is diverted away from a timing task, it is assumed that this gate closes, and fewer pulses reach the accumulator (Gibbon et al., 1984), shortening subjective duration. At the end of the interval, reference memory is used to assign a conventional verbal label (e.g., minute or seconds) to the total number of pulses accumulated by comparing the interval with previous experience.

Fig. 1
figure 1

Attentional-gate model, based on Zakay and Block (1997)

Unlike prospective paradigms, however, retrospective paradigms do not require participants to encode temporal information actively or intentionally. Participants are instead giving the nontemporal task their undivided attention (Brown, 1985). Estimates in retrospective paradigms are thought to rely heavily on contextual cues to code moments in memory, and several timing theories emphasize this assumption (Block & Reed, 1978). For instance, the contextual-change hypothesis considers the nature of events encoded during a target interval. For example, a participant might remember stimuli presented during the interval based on their position, complexity, or sensory modality. These contextual cues are ultimately encoded as events and provide information about how much time has passed. Thus, the greater number of events recalled at the end of an interval, the longer a person’s time estimate will be (Ornstein, 1969). Because context changes rapidly at the onset of an experiment, intervals are judged to be longer at the beginning of a task (Hintzman et al., 1973). Therefore, as time goes on and the context is no longer novel, fewer events are encoded in memory.

Cognitive load

According to the AGM and the contextual-change hypothesis, prospective and retrospective estimates should behave differently under heightened cognitive load. This idea was the focus of Block et al.’s (2010) meta-analysis, which found a cross-over interaction between cognitive load and estimation paradigm. When the cognitive load was low, prospective time estimates were greater than retrospective ones. The opposite finding emerged when the cognitive load was high. Block et al. argued that prospective estimates decreased as cognitive load increased because fewer cognitive resources were available for monitoring time. However, most of the experiments reviewed in Block et al.’s meta-analysis featured tasks that were quite short. For example, a 60-second task was labelled as a “long” duration in their analysis. Hence, it is unclear if their results hold at longer durations.

Choosing a concurrent task

If prospective time estimation requires attentional resources that can be diverted away by a concurrent task (Brown et al., 2013), then there must be a capacity limit on attentional resources available to accomplish both tasks. Therefore, adding a secondary task presents participants with a unique challenge. Dual-task performance is more demanding if resources needed to perform both tasks overlap or if limited resources need to be drawn from a shared pool (see Wickens, 2008, for a review). In fact, evidence suggests that the brain circuits engaged while keeping track of time (i.e., in accumulating pulses) are also involved in attention and working memory (Buhusi & Meck, 2009; Van Rijn et al., 2011, cited in Allman et al., 2014). With practice, however, fewer resources are necessary to accomplish a given task, freeing up attention and memory for other activities. Practice thus results in better dual-task performance. This process is known as automaticity (Logan, 1988), which has been studied extensively in visual search paradigms (e.g., Hélie & Cousineau, 2011; Lefebvre et al., 2008; Moors & De Houwer, 2006). Automaticity in visual search was clearly established when Schneider and Shiffrin (1977); Shiffrin & Schneider, 1977) formulated their theory of controlled versus automatic information processing. Controlled information processing requires attentional resources. In contrast, automatic processing is largely unaffected by task demands because it relies on long-term memory and allows attention to be deployed elsewhere.

The key manipulation in Schneider and Shiffrin’s visual and memory search experiments was whether the alphanumeric characters in the memory set on a given trial could act as distractors on subsequent trials. This manipulation was meant to distinguish between the effect of consistent versus varied mapping of targets to responses. Consistent mapping is obtained when a given response is always associated with a specific stimulus. In Schneider and Shiffrin’s experiments, a participant would consistently provide a “target-present” response to a given set of stimuli throughout the entire experiment. This way, their responses on the keyboard were consistently mapped to specific characters. Mapping a stimulus to a response like this allows for automaticity to build.

In contrast, varied mapping is obtained when there is no one-to-one correspondence between a given stimulus and its associated response. For example, specific digits or consonants can be made to appear in either the memory set or the distractor set so that participants’ responses are not mapped to the same stimuli across trials. Participants must also ignore interfering information because the target on a previous trial is not necessarily a target on the next, maintaining this need for a controlled search. Schneider and Shiffrin found that performance on a visual and memory search task was much better under consistent mapping conditions compared with varied mapping conditions. Specifically, participants were less accurate and responded more slowly under varied mapping conditions.

In addition to consistent stimulus–response associations, Cousineau and Larochelle (2004) identified two more factors that are conducive to the development of automaticity in this task. First, they found that searching for a category of targets among distractors from another category (e.g., looking for digits among letters) facilitated performance. They also established that reducing feature overlap between the targets and distractors (e.g., looking for the target “3” among the distractors “L, H, & R”) aided performance. Thus, our task selection for this experiment took into consideration all these factors to build a visual and memory search task that differentiated as much as possible the cognitive load entailed by the consistent and varied mapping conditions.

Adding prospective timing to visual search

In prospective timing conditions, participants are essentially being asked to multitask. Multitasking has been found to increase response caution, meaning that participants are slower to respond when performing multiple cognitive tasks at once (Howard et al., 2020). In other words, simply adding a concurrent task can change participants’ response strategy. Multitasking is distinct from increased task difficulty, however. Within a single task, changing the level of difficulty does not result in a strategy change, but drift rate decreases with difficulty regardless of whether the cognitive load is manipulated by adding a task or by increasing the difficulty within the same task (Howard et al., 2020).

The addition of a timing task thus differs fundamentally from simply adding a level of difficulty within a single task. Prospective timing, therefore, requires participants to approach the experiment differently compared with retrospective participants because they are instructed to divide their attentional resources, which can be seen as more cognitively demanding. However, for the purpose of this experiment, we use the term “cognitive load” to denote the stimulus–response mapping technique we used to manipulate the difficulty of the visual and memory search task (“consistent” vs. “varied”). Notably, time estimation performance is likely to change with prolonged time spent in the experiment, as practice increases automaticity (lowering cognitive load) in the visual and memory search task.

Task duration

Surprisingly, a review of the literature suggests that durations over 2 minutes are considered “long.” This finding was most clearly illustrated by Tobin et al. (2010) and is depicted in Table 1. These shorter experiments have generated the most robust evidence for a paradigm effect where prospective estimates were larger than retrospective ones (Block & Zakay, 1997). The few experiments that directly compared paradigms in tasks over 2 minutes have found mixed results. Some have failed to find a difference between prospective and retrospective time estimates (Bakan, 1955; Tobin & Grondin, 2009), while others have reported the typical overestimation effect in prospective paradigms (Bisson et al., 2012; Tobin et al., 2010).

Table 1 Past direct comparisons between prospective and retrospective time estimation

Considering how few long durations are present in the literature and a lack of consensus on whether there is a paradigm effect in experiments close to 1 hour, we chose to replicate two “long” durations that have been previously examined: 8 minutes (Tobin & Grondin, 2009) and 58 minutes (Tobin et al., 2010). The AGM and the contextual change hypothesis make specific predictions about duration estimation performance under varying cognitive loads. Therefore, this experiment aimed to put the generalizability of the AGM and contextual change hypothesis to the test. We thus manipulated the estimation paradigm and cognitive load at multiple long durations. To accomplish this, we measured time estimation accuracy and estimation uncertainty.

Examination of variability versus uncertainty

A hallmark of time estimation is that the variability of people’s time estimates increases with the length of the interval. However, their variability is typically proportional to the amount of time being estimated (Wearden, 2016; Wearden & Lejeune, 2008). A common method for examining the variability of time estimates is the coefficient of variation (CV; i.e., the standard deviation divided by the mean estimate of a series of trials). For example, prospective estimates tend to contain less variability than retrospective estimates (Brown, 1985; Brown & Stubbs, 1992; Predebon, 1995). In Block and Zakay (1997), the average coefficient of variation was .33 for prospective estimates and .37 for retrospective estimates. The average retrospective to prospective coefficient of variation ratio was 1.15, meaning that retrospective estimates contained 15% more variability on average than prospective estimates. However, for experiments that collect only a single time estimate from each participant, within-subject variability cannot be inferentially examined using this measure. Instead, the CV can only offer a descriptive measure of group variability. To account for this issue, participants can provide a window that they believe contains the duration (Block et al., 2018).

There has been a recent move toward measuring variability as a subjective measure of uncertainty, especially in research examining long durations (Bisson et al., 2012; Bisson & Grondin, 2020; Tobin et al., 2010). By asking participants to provide a minimum and maximum time estimate, the participants’ overall confidence in their original estimate can be measured and then subjected to inferential analyses. For the present experiment, this measure will be referred to as internal uncertainty (IU). Therefore, in addition to providing a descriptive measure of group differences using the CV, we can investigate whether the subjective uncertainty of estimates differs across conditions.

The present experiment

The goal of the current experiment was to examine the impact of cognitive load and time estimation paradigm on time judgements of long durations. The participants were asked to engage in a visual and memory search task and to provide a single time estimate once the task was completed. They were assigned to one of eight between-subject conditions according to a 2 × 2 × 2 experimental design with three independent variables: cognitive load (low vs. high corresponding to consistent and varied mapping conditions, respectively), time estimation paradigm (prospective vs. retrospective), and task duration (8 vs. 58 minutes).

Hypotheses

Time estimation accuracy

Paradigm × cognitive load

We expected to find an effect of the timing paradigm typical of prospective and retrospective estimates. Specifically, compared with retrospective estimates, prospective time estimates were expected to be greater because participants in this condition paid more attention to time, allowing more internal time units to be counted by the internal clock according to the AGM. Considering prospective and retrospective estimates interact differently under heightened cognitive load, we expected to find a significant two-way interaction between paradigm and cognitive load (as defined by task difficulty).

In line with Block et al.’s (2010) cross-over interaction, prospective estimates were predicted to be shorter in the varied mapping condition (i.e., high cognitive load) compared with the consistent mapping condition (i.e., low cognitive load) because the more difficult task would require a reallocation of attention away from time perception. In contrast, as per the contextual change hypothesis, we predicted that a higher cognitive load would create more distinct and memorable events, creating more cues to recall when estimating time in retrospect. Therefore, opposite of prospective estimates, retrospective estimates would be greater in the varied mapping condition compared with the consistent mapping condition.

Paradigm × cognitive load × duration

The above effect was hypothesized to be moderated by the task’s duration. Specifically, the cross-over effect between cognitive load and paradigm would be most apparent in the 8-minute task. With time spent on the visual and memory search task, attentional resources were expected to decrease as automaticity set in and cognitive load decreased. Additionally, fewer salient and distinct events would be encoded in memory with time, leading to shorter retrospective estimates.

Variability and uncertainty

We predicted that increased attention toward time in the prospective paradigm would result in less group variability on average compared with the retrospective condition, as measured by the CV. Moreover, variability was expected to scale with the length of a task according to the scalar property of variance (Allman et al., 2014).Footnote 1 Finally, the CV was predicted to be greater in the varied mapping condition compared with the consistent mapping condition as would be expected from Block et al.’s (2010) results.

Our examination of IU was exploratory considering few experiments have examined this variable. In Tobin et al. (2010), estimate uncertainty did not differ between prospective and retrospective conditions, as measured by IU (termed Weber’s fraction variant in their paper). However, their shortest condition, which lasted 12 minutes, contained more estimate uncertainty compared with their 35- and 58-minute conditions, demonstrating a potential disconnect between group variability and uncertainty measures of time estimation. We predicted that we would find a similar disconnect.

Method

Participants

A total of 318 students from Carleton University with normal or corrected-to-normal vision was recruited for this experiment.Footnote 2 As compensation for their time, participants received 1.5% toward research participation credits. This study was approved by Carleton University’s Research Ethics Board. Two hundred and ninety-two participants between the ages of 17 and 64 (mean = 19.61, median = 19) completed the experiment. This resulted in a total of 37 participants per condition, with one condition (the prospective, high cognitive load, 58-minute condition) consisting of 38 participants.Footnote 3 Overall, 214 of the participants were female, 74 participants were male, one participant was transgender, and three participants did not indicate a gender.

Experimental design

The proposed experiment consisted of a 2 (paradigm: prospective vs. retrospective) × 2 (duration: 8 minutes vs. 58-minutes) × 2 (cognitive load: high vs. low) between-subjects design. As such, four out of the eight groups received instructions to keep track of time throughout the task, whereas the other four groups did not. Participants were randomly assigned to one of these eight experimental conditions. All performed a visual and memory search task.

Visual and memory search task

The experiment was programmed with E-Prime (Psychology Software Tools, Pittsburgh, PA) on a PC. The stimuli selected to create the consistent and varied mapping conditions were based on Cousineau and Larochelle (2004). All memory set items (i.e., the targets) and nontarget items (i.e., distractors) consisted of alphanumeric characters chosen from a set of eight (L, R, S, H, 1, 3, 6, 7). The memory set and the search array always contained four alphanumeric characters arranged in two lines containing two digits located directly in the center of the screen. The characters were presented in white Times New Roman 18-point font. The background was black.

A memory set size of four yields significant performance differences between consistent and varied mapping conditions (Schneider & Shiffrin, 1977) and was thus best suited for our cognitive load manipulation. On target-present trials, one of the characters in the search array was a target, while the other three were distractors. On target-absent trials, none of the four characters in the search array contained a target. For each participant in the low cognitive load conditions, stimuli were consistently mapped to responses. In these consistent mapping conditions, four characters were chosen to remain as targets across all trials. Furthermore, these characters were either all digits or all letters. The other four characters were used as the distractors. As such, participants in the low load condition who saw letters as their targets throughout the task always looked for a letter among digits in the search array. Those who saw digits as targets always looked for digits among letters. Thus, mapping was not only consistent, but based on preexisting categories (digits vs. letters). In the varied mapping condition, four characters were randomly selected to serve as the targets and the remaining four characters were randomly selected to be distractors on each trial. The targets could thus consist of both digits and letters that change on every trial so that high performance on this task required controlled processing in this condition.

Visual and memory search procedure

All participants received a detailed explanation of the visual and memory search task procedure before beginning the experiment. For participants in the prospective time-keeping condition, they were told that both their accuracy in the visual and memory search task and their ability to keep track of the time were equally important for the experiment.

The procedure for the visual and memory search task was also largely based on Cousineau and Larochelle (2004). A typical trial is illustrated in Fig. 2 and proceeded as follows. First, a fixation star appeared for 500 ms. This was followed by a memory set containing four target digits for 1,000 ms, and then a fixation star for 500 ms. The visual search array containing distractors (and one of the targets on target-present trials) was then be presented for a maximum of 3,000 ms or until a response was made. Finally, feedback on their response was displayed for 900 ms. The interstimulus interval (ISI) lasted 100 ms plus the remaining time carried over from the 3,000 ms maximum of the visual search display. Consequently, every trial took 6,000 ms regardless of the participants’ RTs. There were 80 trials in the 8-minute task and 580 trials in the 58-minute task. There was also an equal number of target-present and target-absent trials for a total of 40 target-present trials in the 8-minute task and 290 in the 58-minute task.

Fig. 2
figure 2

Visual and memory search task procedure for a target-present, low cognitive load condition

Participants responded using the “Z” and “M” keys with their left and right hand, respectively. These response keys were counter-balanced so that half the participants pressed “M” when they perceived that the target as present and half pressed “M” when they perceived the target as absent. Visual feedback informed the participant of their accuracy on each trial (“CORRECT” or “INCORRECT”). Feedback on mean accuracy was also be provided here as a percentage (e.g., “96.00% accurate”). If no response was made, the trial was recorded as null, and “INCORRECT” appeared on the screen, prompting the participant to increase their attention on subsequent trials. Each participant had 20 practice trials before beginning the experiment to familiarize themselves with the task.

General procedure

Participants completed the experiment alone in a quiet room. They were asked to leave their phone, music devices, headphones, bracelets, watches, and any other distracting items in a container located inside the testing room. They were also asked to turn off the electronic devices. The container was placed in a position unreachable by the participant in session yet inside the testing room to ensure they were not preoccupied with the location of their valuables. Participants were told that the rationale for removing these items was distractions like wrist jewelry can sometimes interfere with quick motor responses. Finally, there were no clocks or other distractions that could interfere with their search performance.

All participants were informed that the task required each person to complete a randomly selected number of experimental trials chosen by the experimental software. They were informed that the task would not take more than 90-minutes. This 90-minute cap was strategically chosen as not to pull the estimates too far away from the actual time elapsed in the 58-minute conditions, yet it was still large enough to allow for estimate variation.

Participants in the retrospective conditions were told that the purpose of the experiment was to study performance in a visual and memory search task. This was in line with research in which the task was disguised as testing other perceptual processes (Brown, 1985). They were told that having a random number of trials helps determine the effect of training. Participants in prospective conditions knew that their perception of time was being studied. However, they were also informed that an equally important experimental purpose was the investigation of accuracy in the visual and memory search task. All participants were thus told to be as accurate as possible in their visual and memory search performance.

All participants saw the same prompts after completing the visual and memory search task. They were asked to provide an estimate of how long the visual and memory search task took to complete. They were also asked to provide a maximum and a minimum estimate to determine how confident they were about their answers. The participants typed their response, in minutes, in a text box provided on screen. Participants were also asked if they had complied with the instructions not to use personal devices during the experiment.

Results

Participant screening

Five participants were excluded because they reported checking the time throughout the task. All five of these participants belonged to a retrospective, 58-minute condition (four from the low cognitive load and one from the high cognitive load condition). Additionally, one participant from a retrospective condition was excluded because she guessed the true purpose of the experiment. This participant reported mentally keeping track of time throughout the task.

Five participants performed the visual and memory search task at chance. They were removed from further analyses because they were likely inattentive to the task and were thus unaffected by the cognitive load manipulation. Participants’ visual and memory search data were then examined for outliers. Each participant’s accuracy was analyzed with respect to the average performance of their group. Those who had an overall accuracy exceeding three standard deviations above or below their group’s mean were excluded from the analyses. This resulted in the removal of an additional four participants’ data. All four participants belonged to a prospective, 58-minute condition (one from the high load and three from the low load task). The final sample for this experiment thus consisted of 273 participants after data cleaning. This resulted in 30 to 36 participants per group. They ranged in age from 17 to 53 years.Footnote 4

Manipulation check

Visual and memory search accuracy

In this experiment, variable mapping was meant to implement a high cognitive load and consistent mapping was meant to implement a low cognitive load. To investigate whether this manipulation was successful, a 2 (estimation paradigm: prospective vs. retrospective) × 2 (task duration: 8 vs. 58 minutes) × 2 (response mapping: consistent vs. varied mapping) analysis of variance (ANOVA) was performed on participants’ mean visual and memory search task accuracy. The results of the ANOVA are shown in Fig. 3.

Fig. 3
figure 3

Mean percentage of correct responses on the visual and memory search task. The left panel contains the results of participants in the prospective time estimation paradigm. The right panel contains the results of participants in the retrospective time estimation paradigm. Error bars show the 95% confidence intervals of the mean

No main effect of prospective vs. retrospective time estimation paradigm was found, F(1, 265) = .68, p = .41, ɳp2 = .003. However, there was a main effect of response mapping, F(1, 265) = 389.53, p < .001, ɳp2 = .60, a main effect of task duration, F(1, 265) = 11.06, p = .001, ɳp2 = .04, and significant two-way interaction found between response mapping and task duration, F(1, 265) = 7.16, p = .008, ɳp2 = .03. Using an alpha level of .025 to correct for multiple comparisons, we found that the duration of the task significantly impacted accuracy in the varied mapping condition only, F(1, 265) = 18.68, p < .001, ɳp2 = .06. Specifically, mean accuracy in the varied mapping, 8-minute condition (M = 81.8%, SD = 8.5%, 95% CI [80.4%, 83.2%]) was significantly lower than mean accuracy in the varied mapping, 58-minute condition (M = 86.1%, SD = 7.5%, 95% CI [84.7%, 87.5%]). In other words, higher accuracy was observed when participants had more practice on the task. However, there was no significant difference in accuracy between the 8- and 58-minute tasks when response mapping was consistent. This task was evidently easy without much practice. The remaining two-way interactions were not significant (ps > .43, ɳp2 <.002) and the three-way interaction was not significant either, F(1, 265) = .05, p = .83, ɳp2 < .001. The response accuracy results thus suggest that the cognitive load manipulation was successful. The observed interaction simply revealed that 58 minutes of practice is better than 8 minutes of practice in a difficult task.

Visual and memory search RTs

In line with previous research, response times (RTs) below 200 ms were removed from the analysis because they were assumed to represent response anticipation (Wolfe et al., 2010). This resulted in the removal of 1,544 trials (1.5% of trials). Finally, only the RTs of accurate trials were used in the analysis. On average, target-present trials (M = 968.55, SD = 222.91) were significantly faster than target-absent trials (M = 1156.47, SD = 407.67), t(272) = 12.68, p < .001. Considering target present trials were also less accurate than target-absent trials, this is the typical speed–accuracy trade-off seen in simple decision tasks. As such, the pattern of RTs was as expected.

A 2 (estimation paradigm: prospective vs. retrospective) × 2 (task duration: 8 vs. 58 minutes) × 2 (response mapping: consistent vs. varied mapping) ANOVA was next performed on participants’ RTs of accurate trials. The ANOVA revealed a main effect of response mapping, F(1, 265) = 388.11, p < .001, ɳp2 = .59. However, a significant interaction was found between estimation paradigm and response mapping, F(1, 265) = 6.08, p = .01, ɳp2 = .02, and a significant three-way interaction was found between time estimation paradigm, duration, and response mapping, F(1, 265) = 6.43, p = .01, ɳp2 = .02. These results are presented in Fig. 4. None of the other main effects and two-way interactions were significant (ps > .14, ɳp2 < .008).

Fig. 4
figure 4

Mean response time (RT) in milliseconds (ms) on the visual and memory search task. The left panel contains the results of participants in the prospective time estimation paradigm. The right panel contains the results of participants in the retrospective time estimation paradigm. Error bars show the 95% confidence intervals of the mean

To understand the three-way interaction, the ANOVA was decomposed and an alpha level of .025 was used to correct for multiple comparisons. A significant simple interaction was found between estimation paradigm and response mapping in the 8-minute condition, F(1, 265) = 13.06, p < .001. However, there was no Paradigm × Mapping simple interaction effect in the 58-minute condition, F(1, 265) = .002, p = .96. Second-order simple effects were then analyzed for the 8-minute condition. The effect of the estimation paradigm on RTs was significant for participants who performed the varied mapping task, F(1, 265) = 14.11, p < .001 (corrected alpha level of .016). That is, participants had faster RTs in this task if they were also in the prospective condition compared with the retrospective condition (mean difference = 175.53 ms, 95% CI [83.51, 267.55]). In the consistent mapping condition, no second-order simple effect of paradigm was found at 8 minutes, F(1, 265) = 1.84, p = .18.

Time estimation results

The mean duration estimates for our three estimation questions are outlined in Table 2 according to each of our eight groups.

Table 2 Group mean duration estimates (as well as group mean minimum and maximum estimates) in minutes

A 2 (paradigm: prospective vs. retrospective) × 2 (duration: 8 minutes vs. 58 minutes) × 2 (cognitive load: high vs. low) between-subjects ANOVA were conducted on three key variables outlined below (based on Tobin et al., 2010).Footnote 5

Duration estimate ratio results

First, a duration-estimate ratio (RATIO) was calculated following Tobin et al. (2010) by dividing each participant’s time estimate (not including the practice trials; ED) by the actual task duration (TD). If the task was overestimated, this value was greater than 1:

$$ \mathrm{RATIO}={\mathrm{E}}_{\mathrm{D}}/{\mathrm{T}}_{\mathrm{D}}. $$
(1)

Four participants in this study did not answer the time estimation question. Additionally, participants’ time estimates were screened for outliers. Typically, time estimates are removed if they are 3 times the standard deviation below or above the mean for their group (Tobin & Grondin, 2009). Two participants provided outlying estimates. One participant belonged to the prospective, high load, 58-minute condition, and the other belonged to the retrospective, low load, 8-minute condition. In both cases, the participants severely overestimated the duration of their task. These participants were removed from the analysis of RATIO.

On average, there was a wide range of overestimation and underestimation, depending on the group. Specifically, the ANOVA also revealed a significant main effect of duration, F(1, 259) = 142.33, p < .001, ɳp2 = .36. On average, participants overestimated the duration of the 8-minute task (M = 1.33, SD = .52, 95% CI [1.26, 1.40]) and underestimated the duration of the 58-minute task (M = .73, SD = .24, 95% CI [.66, .80]). This result is in line with Tobin and Grondin (2009), who found a significant overestimation in their 8-minute condition (M = 1.44) and underestimation in their 24-minute condition (M = .80). No other significant main effects were found (ps > .37, ɳp2 < .004).

Although all participants overestimated the task’s duration in the 8-minute condition, the above results are qualified by a significant two-way interaction between estimation paradigm and task duration, F(1, 259) = 4.73, p = .03, ɳp2 = .02. Following examination of the simple main effects using an alpha level of .025, estimation paradigm had a significant effect on participants’ RATIO, but only in the 8-minute task, F(1, 259) = 5.38, p = .02, ɳp2 = .02. As seen in Fig. 5, participants overestimated the 8-minute task’s duration more if they were part of the prospective condition (M = 1.40, SD = .51, 95% CI [1.25, 1.52]) as opposed to the retrospective one (M = 1.29, SD = .50, 95% CI [1.16, 1.43]). The other two-way interactions were not significant (ps > .45, ɳp2 < .002) and no significant three-way interaction was found, F(1, 259) = .44, p = .51, ɳp2 = .002.

Fig. 5
figure 5

Mean time estimation RATIOs (participants’ duration estimate over the actual task duration). Left panel = participants in the prospective time estimation paradigm; right panel = participants in the retrospective time estimation paradigm. Error bars show the 95% confidence intervals of the mean

Absolute standard error results

To determine how far the estimates deviated from real time, the second dependent variable was the absolute standard error (ASE). A larger ASE meant that the estimate was further from the actual task duration. ASE was calculated by taking the absolute difference between a time estimate and the task duration (ED − TD) and dividing this by the task duration (TD). These results are outlined in Fig. 6:

$$ \mathrm{ASE}=\mathrm{absolute}\left({\mathrm{E}}_{\mathrm{D}}\hbox{--} {\mathrm{T}}_{\mathrm{D}}\right)/{\mathrm{T}}_{\mathrm{D}}. $$
(2)
Fig. 6
figure 6

Absolute standard errors (the absolute difference between the participants’ duration estimate and the actual task duration, divided by the actual task duration). Left panel = participants in the prospective time estimation paradigm; right panel = participants in the retrospective time estimation paradigm. Error bars show the 95% confidence intervals of the mean

Contrary to Tobin et al. (2010), there was a main effect of estimation paradigm, F(1, 259) = 4.58, p = .03, ɳp2 = .02. On average, participants in a prospective paradigm provided estimates that were further away from the task’s actual duration (M = .43, SD = .29, 95% CI [.37, .48]) compared with participants who were in a retrospective paradigm (M = .34, SD = .37, 95% CI [.29, .40]). The paradigm effect was only marginally significant in Tobin et al. (p = .067, ɳp2 = .03). They too found that ASE was higher in the prospective paradigm (M = .49) than it was in the retrospective paradigm (M =.36). Like Tobin et al. (2010), a main effect of duration was also found in the present experiment, F(1, 259) = 12.11, p < .001, ɳp2 = .05. Participants who were in an 8-minute condition provided estimates further away from the actual duration (M = .45, SD = .42, 95% CI [.40, .51]) than participants who were in a 58-minute condition (M = .32, SD = .18, 95% CI [.26, .37]). Thus, the shortest task duration was the one with the highest ASE.

There was no significant effect of cognitive load on ASE (p = .90, ɳp2 = .001), nor were there any two-way interactions found (ps > .19, ɳp2 < .006). Finally, no three-way interaction was found (p = .18, ɳp2 = .007).

Internal uncertainty results

Participants’ estimation uncertainty was measured next. IU was calculated by taking maximum (MaxT) and minimum (MinT) time estimates from each participant, finding the difference and dividing this difference by the actual task duration (TD). Higher IU indicated higher estimate uncertainty.

$$ \mathrm{IU}=\left({\mathit{\operatorname{Max}}}_T\hbox{--} {\mathit{\operatorname{Max}}}_{\mathrm{T}}\right)/{\mathrm{T}}_{\mathrm{D}}. $$
(3)

Because these estimates were used to derive the IU, they too were screened for null and outlying responses. All participants provided a minimum time estimate. However, two participants did not provide a maximum. Moreover, one participant provided “6 minutes” as their maximum after providing a minimum estimate of “55 minutes.” Therefore, his response was removed from further analysis. Finally, three participants provided either a maximum or minimum estimate that exceeded three standard deviations from the mean of their respective groups. Thus, their responses were also removed.

There was no significant main effect of estimation paradigm (p = .08, ɳp2 = .01), nor was there a significant main effect of cognitive load on IU (p = .68, ɳp2 = .001). There was, however, a significant main effect of duration, F(1, 261) = 77.56, p < .001, ɳp2 = .23. The average IU of the 8-minute condition (M = .69, SD = .52, 95% CI [.63, .80]) was higher than the 58-minute condition (M = .26, SD = .23, 95% CI [.20, .33]). However, like RATIO, a two-way interaction was found between estimation paradigm and task duration, F(1, 261) = 7.16, p = .008, ɳp2 = .03. Analysis of simple main effects using an adjusted alpha level of .025 revealed that estimation paradigm had a significant effect on IU in the 8-minute condition only, F(1, 261) = 10.36, p = .001, ɳp2 = .04. Specifically, participants in the 8-minute condition had a higher IU if they were also in the prospective condition (M = .80, SD = .57, 95% CI [.71, .90]) as opposed to the retrospective one (M = .59, SD = .43, 95% CI [.49,.68]). Furthermore, the other two-way interactions were not significant (ps > .09, ɳp2 < .01). There was no three-way interaction either, F(1, 261) = .41, p = .52, ɳp2 = .002). These results are shown in Fig. 7.

Fig. 7
figure 7

Internal uncertainty (IU; the difference between the participants’ maximum and minimum time estimates over the actual task duration). Left panel = participants in the prospective time estimation paradigm; right panel = participants in the retrospective time estimation paradigm. Error bars show the 95% confidence intervals of the mean

Like Tobin et al. (2010), the 58-minute task in the present study was perceived as proportionally shorter (as seen by the RATIO results), was more accurate (smaller ASE), but and contained proportionally less uncertainty (smaller IU) compared with the shorter task.

The coefficient of variation (CV)

The CV for each condition was found by dividing the group’s standard deviation by its mean. Overall, the average CV was 0.69 for the prospective paradigm and 0.77 for the retrospective paradigm. Thus, the retrospective to prospective CV ratio was 1.11. In other words, the retrospective group contained 10% more variability of estimates than the prospective group. This is slightly smaller than the CV ratio found in very short tasks (a ratio of 1.15; Block & Zakay, 1997). Yet the difference in variability is in the same direction as shorter tasks (larger for retrospective estimates). Moreover, a variability difference was found between the high (CV = .66) and low (CV = .72) cognitive load conditions (CV ratio = 1.09). The low cognitive load condition contained 9% more estimation variability than the high load condition. Finally, the 8-minute to 58-minute CV ratio was 1.25. The 8-minute condition (CV = .40) thus contained 25% more estimation variability than the 58-minute one (CV = .32).

Discussion

The goal of this research was to consider the impact of task duration and cognitive load on people’s prospective and retrospective estimation of time. Block et al. (2010) reported an interaction between estimation paradigm and cognitive load. However, most of the literature on human time perception focuses on durations ranging from a few hundred milliseconds to a few minutes. Thus, this experiment sought to replicate Block et al.’s (2010) findings for a long (i.e., 8-minute) and a very long (i.e., 58-minute) duration. Importantly, this experiment also directly assessed the amount of cognitive load demanded of a task. Past research has either not considered the impact of cognitive load at long durations (e.g., Tobin et al., 2010) or has not independently measured it (e.g., Tobin & Grondin, 2009). Furthermore, whether traditional models of human time perception can generalize to longer experiments remains unclear in the broader literature (Matthews & Meck, 2014). To address this issue, our experiment used consistent and varied stimulus–response mappings in a visual and memory search task (Cousineau & Larochelle, 2004; Schneider & Shiffrin, 1977).

Duration estimation (RATIO and ASE)

No three-way interactions were found between cognitive load, task duration, and estimation paradigm on RATIO and ASE results, contrary to what we predicted. However, when examining the RATIO results, differences between estimation paradigms (prospective vs. retrospective) were found in the 8-minute task. Although participants overestimated the 8-minute task by more than 33% in both estimation paradigms, overestimation was greater when estimates were prospective. This paradigm × duration effect is in line with what would be expected of the AGM (Zakay & Block, 1997). Paying attention to time increases the amount of information used by the mind’s internal clock to derive a time estimate. Participants presumably paid more attention to time in the prospective estimation paradigm. Thus, more temporal information was captured. However, our 58-minute task does not appear to fit perfectly within the AGM framework nor the contextual change framework. Instead, participants underestimated the 58-minute task to an equal degree, regardless of the estimation paradigm. This result thus does not replicate the significant overestimation effect found in Tobin et al.’s (2010) 58-minute, prospective condition, but does replicate Bakan’s (1955) original finding that there is no estimation paradigm effect in a task that was almost one hour long.

It should be noted, however, that a main effect of estimation paradigm was observed when ASEs were analyzed. That is, prospective estimates in this experiment were further away from the actual task’s duration on average compared with retrospective estimates, which fits well with the AGM. Yet our cognitive load manipulation provided evidence that was contradictory to this model’s prediction. We did not even find an effect of cognitive load on timing performance. Our varied mapping condition in the visual and memory search task should have diverted more attention away from temporal processing when prospective timing was employed. It should have also increased the number of contextual cues available when making retrospective estimates. This was not so. Perhaps traditional cognitive models are less fitting for time estimation in longer tasks than are typically employed in the literature.

In fact, cognitive load yielded no significant interactions with duration or paradigm. This is particularly surprising because our high load version of the visual and memory search task was easier to perform after 58 minutes compared with 8 minutes (as measured by search accuracy in these conditions). As such, the cognitive load should have had its strongest influence on time estimation accuracy at 8 minutes, and this should have been especially true in the prospective condition where participants were multitasking in addition to experiencing this high load visual and memory search task. After all, higher cognitive load typically results in shorter perceived duration, and this has been reported in a wide range of experiments (Brown, 1997). At the very least, one would expect that dividing attention between a temporal and nontemporal task in the prospective paradigm would still yield performance differences.

We were thus curious to determine if prospective instructions interfered with visual and memory search performance. Looking at our manipulation check for cognitive load, a three-way interaction was observed between cognitive load, duration, and paradigm on participants’ RTs. However, this interaction result is in the opposite direction of what would be expected on dual-task interference. Performance on the visual and memory search task should be worse under the varied mapping condition when they had less practice (i.e., at 8 minutes), and they were engaged in concurrent prospective timing. Therefore, this group should have exhibited the slowest RTs and search accuracy should have been especially poor. Instead, accuracy on the visual and memory search task was equivalent across paradigms, and RTs in the varied mapping, prospective condition were faster at 8 minutes. Perhaps, in the prospective condition, people simply adopted a different strategy. It appears they responded quickly at the beginning of the task to ensure that they had enough time to keep track of time. Surprisingly, this strategy is opposite that of heightened response caution typical of multitasking environments (Howard et al., 2020). As the task progressed (they ended up being in the 58-minute task), participants seemed to slow down and become more comfortable doing both the time keeping and the visual and memory search task simultaneously.

Internal uncertainty

A two-way interaction was observed between the estimation paradigm and task duration on IU results. In the 8-minute task, participants in the prospective condition displayed greater uncertainty in their estimates compared with the retrospective one. This aligns with the RATIO and ASE data, in which participants in the prospective condition overestimated the 8-minute task. In other words, participants who explicitly paid attention to time were more uncertain of their estimates (i.e., they provided larger confidence ranges). Future research should investigate whether there is a causal relation between prospective estimation and uncertainty. Perhaps participants are aware of their attentional bias when prospectively timing and thus widen their reported range as a result. This would suggest that people can report a level of confidence in their estimates that aligns with task performance; a necessary skill for guiding human behaviour (Fleming & Daw, 2017).

The CV results, on the other hand, suggest some difference in group variability in the typical direction: retrospective estimates contained 10% more overall variability, replicating the result of Block and Zakay’s (1997) meta-analysis. Therefore, higher variability within a group does not necessarily mean that participants are less confident in their responses. In fact, emerging evidence suggests that people are aware of subjective temporal distortions (Lamotte et al., 2012), and can even track the magnitude and direction of their time estimation errors (Akdoğan & Balcı, 2017). However, it is unclear why people are unable to correct for these errors (Kononowicz et al., 2019).

In our experiment, the length of our tasks yielded interesting results on IU. Underestimation of time occurred in the 58-minute condition (mean = 42.5 minutes, median = 40 minutes, mean IU =.26), but participants were more certain of their responses in this task compared with the 8-minute one (mean = 10.7 minutes, median = 10 minutes, mean IU = .69). This finding replicates Tobin et al. (2010) who had also found smaller IUs in their 58-minute condition compared with their two other durations (12 and 36 minutes).

Future research should investigate error monitoring when timing long durations, as this ability is not currently accounted for in theories of psychological timing (Kononowicz et al., 2019). For example, the AGM, which posits that time estimates are derived from the accumulation of pulses via a pacemaker, has been proposed to account for error-monitoring ability (Akdoğan & Balcı, 2017). People accumulate evidence toward a response based on a drift-diffusion process. This model accounts for observations that the variability of estimates scales with the duration being estimated. However, in our experiment, CV results of the 8-minute condition contained 25% more estimation variability than the 58-minute one. Because participants in our experiment only provided one estimate, we could not examine intrasubject variability. Thus, future research should examine whether intrasubject variability of estimates scales with duration at these time scales.

Concluding remarks and future directions

In addition to highlighting a need for further investigation into estimation uncertainty, our research highlights the importance of the type of tasks used in long-duration research. The visual and memory search task chosen for our experiment had two main advantages. First, it allowed for maximum control over the difficulty of the experiment so that the impact of cognitive load could be carefully examined. Second, performance indices for visual and memory search are well known and could therefore be used to determine whether participants were attentive to the task throughout the experiment. Recently, research in this area has turned attention toward comparing prospective and retrospective estimates under long, daily-life type activities (e.g., Bisson & Grondin, 2020; Tobin et al., 2010). Repetitive tasks like visual and memory search do occur regularly in daily life (e.g., some data entry tasks, scanning and labelling items in a retail environment, repetition in video games). Therefore, the present experiment offers insight into time estimation ability in a type of task that requires repetitive and automatic stimulus–response processing.

The length of the task remains an important topic of future investigation. This was only the third known experiment to compare prospective and retrospective time estimation for a task close to 1 hour. Therefore, predictions were limited in that they were mainly based on literature that focussed on short durations (Block et al., 2010) or on literature that examined very long durations in uncontrolled environments (Tobin et al., 2010). The present experiment also highlighted the need to operationalize the term “long” when referring to task durations. A promising avenue of future research is in determining the point at which there is no longer a paradigm effect. That is, at what point do prospective and retrospective estimates converge? Could this be the point at which a duration should be considered “long”? To answer these questions, future experiments could systematically vary the amount of time people perform the same task.

The data set generated during and analyzed during the current study are available from the corresponding authors on reasonable request. The reported experiment was not preregistered.