Context dependency of time-based event-related expectations for different modalities

Expectations about the temporal occurrence of events (when) are often tied with the expectations about certain event-related properties (what and where) happening at these time points. For instance, slowly waking up in the morning we expect our alarm clock to go off; however, the longer we do not hear it the more likely we already missed it. However, most current evidence for complex time-based event-related expectations (TBEEs) is based on the visual modality. Here we tested whether implicit TBEEs can act cross-modally. To this end, visual and auditory stimulus streams were presented which contained early and late targets embedded among distractors (to maximise temporal target uncertainty). Foreperiod-modality-contingencies were manipulated run-wise: visual targets either occurred early in 80% of trials and auditory targets occurred late in 80% of trials or vice versa. Participants showed increased sensitivity for expected auditory early/visual late targets which increased over time while the opposite pattern was observed for visual early/auditory late targets. A benefit in reaction times was only found for auditory early trials. Together, this pattern of results suggests that implicit context-dependent TBEEs for auditory targets after short foreperiods (be they correct or not) dominated and determined which modality became more expected at the late position irrespective of the veridical statistical regularity. Hence, TBEEs in cross-modal and uncertain environments are context-dependent, shaped by the dominant modality in temporal tasks (i.e., auditory) and only boost performance cross-modally when expectations about the event after the short foreperiod match with the run-wise context (i.e., auditory early/visual late). Supplementary Information The online version contains supplementary material available at 10.1007/s00426-021-01564-9.


Introduction
Throughout our lives, we learn about regularities and contingencies in our environment and use expectations about them to optimize and adapt our behaviour. For instance, if we go outside, we are likely to encounter potentially harmful cars on the street rather than the sidewalk (event-related expectations) and in sports, the command 'ready-set-go' provides temporal cues as to when the start signal will occur (time-related expectations). However, often event-and timerelated expectations go hand in hand and predict each other: after putting our favourite dish in the oven, we soon start to expect to smell a delicious scent; however, the longer we do not smell anything the more likely that we forgot to turn on the oven.
In the past, different types of expectations 1 such as spatial, identity-specific, temporal and time-based event-related expectations (for graphical illustration, see Fig. 1) have been examined. Spatial expectations-expectations about where events will happen-are traditionally manipulated using spatial probabilistic cues without providing information about the temporal onset or identity of targets (Posner, 1980;Posner et al., 1980;Zuanazzi & Noppeney, 2020). Identityspecific expectations-expectations about the identity of the upcoming target-have been studied in many contexts, e.g., by manipulating the likelihood that a specific feature (which 1 3 has to be discriminated) will be presented, while controlling for target's temporal onset and spatial occurrence (Puri & Wojciulik, 2008;Summerfield & Egner, 2016). In contrast, temporal expectations (for review, see ) -expectations about when events will happen -have been studied by manipulating the likelihood of target's onset time but here, equally for each event type and spatial position. For instance, target occurrence could be more likely after a short than long interval while the presence of particular to-be-discriminated (task-relevant) stimulus features is unpredictable (Ball et al., 2018a(Ball et al., , b, 2021a. Thus, all the aforementioned studies manipulated one specific dimension (space, identity OR time) while balancing the other two. 2 In contrast, time-based event-related expectations (TBEE) are expectations for a certain target property of the event (what or where) conditioned upon certain points in time (for review, see Thomaschke & Dreisbach, 2015). Thus, the individual time-point always predicts the most likely event at this time-point (foreperiod-event contingencies). Importantly, to study 'pure' TBEEs, neither spatial, Here we hypothetically assume that each paradigm uses two different foreperiods (short and long), two spatial positions (left and right) and two possible target identities (square and triangle). Exemplary trial probabilities are displayed for build-up of leftward expectations (spatial), square expectations (identity) and short foreperiods (temporal). For time-based expectation, we show two scenarios: one in which spatial positions are primed by foreperiods and one in which identity is primed. (Bottom) Bar graph illustration of the condition-wise probabilities (top row: left side stimuli, bottom row: right side stimuli). For spatial, identity and temporal expectations, only this distinct dimension is predictive (space, identity or time) but does not inform about the other dimensions. In contrast, time-based expectations always inform about the most likely event or spatial position based on the foreperiod (time-event contingencies) 2 Note that temporal expectation (TE) studies often present stimuli at the same spatial position across conditions. Thus, participants might create spatial expectations (expecting the fixed position) alongside with temporal expectations. In addition, some studies (Coull & Nobre, 1998;Doherty et al., 2005;Rimmele et al., 2011) directly manipulated spatial and temporal expectations. While they show synergistic effects on the neural level, interactions and synergistic influences on the behavioural level appear to be absent. More importantly, these studies indicate that TE effects are present even in the absence of focused spatial attention/expectation (i.e. TE effect is present whether stimuli are presented at a fixed or random position).
Footnote 2 (continued) 1 3 identity-specific nor temporal expectations are manipulated individually (balanced design; e.g. same amount of short and long foreperiods); only the foreperiod-event contingencies are manipulated. Hence, while time, space and identity themselves are uninformative in TBEE tasks, their contingencies are informative, with time always informing about what or where an event will happen.
Like research on other expectations, TBEEs were mainly studied in the visual domain (Aufschnaiter et al., 2018a, b;Kunchulia et al., 2017;Thomaschke & Dreisbach, 2013Thomaschke et al., 2011;Volberg & Thomaschke, 2017;Wagener & Hoffmann, 2010). In the majority of these studies, TBEEs were manipulated using two different foreperiods (e.g., 600 ms and 1400 ms, indicated by the presentation of a fixation cross) and rendering one of two possible events (e.g., square or circle) more likely to be presented after one than the other foreperiod (e.g. square appears 80% after 600 ms and 20% after 1400 ms and for circle it is the other way around). The central finding across studies was that reaction times (but not necessarily accuracy) were reduced for likely compared with unlikely foreperiod-event contingencies.
Moreover, TBEEs are found in different contexts similarly to TEs which are observed for different tasks, for different modalities, etc. (Ball et al., 2018a(Ball et al., , b, 2020(Ball et al., , 2021aCoull & Nobre, 1998;Cravo et al., 2013;Jepma et al., 2012;Niemi & Näätänen, 1981). For instance, reaction times in TBEE studies do not only improve when the foreperiod primes a specific shape that has to be identified (Thomaschke et al., 2011;Wagener & Hoffmann, 2010), but also when it primes words which have to be discriminated (Thomaschke et al., 2018), a specific task that has to be executed (Aufschnaiter et al., 2018a, b) or a specific spatial position instead of the to-be-distinguished item itself (Wagener & Hoffmann, 2010). Further, behavioural benefits due to TBEE can extend to time points adjacent to the most likely foreperiod (Thomaschke et al., 2011). This is in close resemblance to findings showing that benefits due to TE can also generalize across larger time windows (see, e.g., Ball et al., 2018b;Bouwer & Honing, 2015;Breska & Deouell, 2016). Further, TE as well as TBEE studies both demonstrate that participants are able to learn global statistical regularities (e.g. in TE studies -more short than long foreperiods in one run) for individual runs, hence across trials (Aufschnaiter et al., 2018a;Ball et al., 2018aBall et al., , b, 2020Ball et al., , 2021aJaramillo & Zador, 2011;Kunchulia et al., 2017;Thomaschke & Dreisbach, 2013Volberg & Thomaschke, 2017). This learning process might be based on higher-order sensory memory processes, allowing for the calculation of temporal and event probabilities to improve anticipation and thus, task performance (Zinchenko et al., 2018). In sum, previous research demonstrated that TBEEs can act within the visual domain and effectively prime simple visual features, spatial positions as well as more abstract constructs such as the potentially upcoming task itself.
However, TBEEs so far have been dominated by investigations in unisensory visual events. As our surrounding often stimulates multiple senses, we here addressed the important question whether TBEEs generalise across different sensory systems and can also be created for stimuli of different modalities. To our knowledge, it is currently unresolved whether TBEEs are operating only within one sensory system or can act in a cross-modal, generalised fashion and thus, prime the appearance of a modality-specific event depending on the foreperiod. Recently, we were able to show that TEs (faster and more correct responses for stimuli presented at expected moments in time) differ across modalities (Ball et al., 2018a(Ball et al., , b, 2021a; specifically, effects of TE were often reduced for unimodal auditory and even more for visual unisensory targets (compared with multisensory audio-visual targets). This finding might imply that participants, instead of creating TEs which should facilitate perception of and preparation for all targets independent of modality may have rather created modality-specific TBEE in our TE study. However, given that we manipulated only TEs and not TBEE, our results can also be explained by multisensory interplay (i.e., higher performance for multi-compared with unisensory stimuli; see, e.g., Driver & Noesselt, 2008;Starke et al., 2020) and by the notion that some targets (i.e., audio-visual and auditory) might be easier affected by TEs (as compared with visual targets; see Ball et al., 2018a, b;Wilsch et al., 2020).
In contrast to our multisensory TE experiment, three previous studies (Lange & Röder, 2006;Mühlberg & Soto-Faraco, 2019;Mühlberg et al., 2014) used a multisensory hybrid-design (audio-tactile or visual-tactile experiments) in which the authors manipulated TBEEs (one modality more likely depending on time point) but also TEs (one time point was more likely) as well as identity-specific expectation (one modality was more likely). 3 However, these experiments also do not allow for any strong conclusions about cross-modal TBEEs as either TBEE effects were not analysed (Mühlberg & Soto-Faraco, 2019;Mühlberg et al., 2014) or data analysis was restricted to short foreperiods (Lange & Röder, 2006), i.e., only partially analysed. Please note that this is not a direct shortcoming of the studies themselves as all three studies focussed on cross-modal de-/coupling of temporal attention and not time-based event-related expectations. Further, it is impossible to determine which type of expectation (temporal, identity-specific and/or TBEE) shaped the presented data patterns. For instance, it is conceivable that the mere combination of identity-specific and temporal expectations affected participants' performance. If TBEEs were affecting performance in these studies, the presented descriptive statistics suggest that they potentially decrease reaction times for the more likely (primary) compared with the less likely presented modality (secondary) but only when presented after an expected foreperiod. Hence, while it is possible that TBEEs could prime a certain modality based on the foreperiod and thereby affecting behaviour, previous studies -due to the lack of focus on this topic fell short to provide evidence of the existence of cross-modal TBEEs.
Here we investigate for the first time directly whether participants can create TBEEs for events of different modalities. Please note, that we did not use a cueing approach (explicit manipulation) but rather studied implicitly created TBEEs. To this end, we altered an audio-visual paradigm which we previously established to study implicit TEs (Ball et al., 2018a, b;Jaramillo & Zador, 2011) based on standard designs to investigate TBEEs (Thomaschke & Dreisbach, 2015;Volberg & Thomaschke, 2017;Wagener & Hoffmann, 2010). Note that our paradigm was specifically designed to test for changes in perceptual sensitivity and response times while previous studies focused mainly on reaction time (RT) effects.
In the current study, we balanced the number of events (i.e., modalities) and foreperiod intervals and exclusively manipulated TBEEs. On each trial, we presented a sequence of 15 auditory or visual stimuli with 1 stimulus being the deviant target stimulus that had to be identified (lower or higher frequency than distractors). Based on previous TBEE studies ( Thomaschke & Dreisbach, 2015;Volberg & Thomaschke, 2017;Wagener & Hoffmann, 2010), target stimuli were either presented early or late in the sequence and TBEEs were manipulated runwise. In one experimental run, auditory target stimuli were more likely to be presented early (80%) while visual target stimuli were more likely (80%) to be presented late. In the other run, the foreperiod-event contingency was reversed. We hypothesized that if TBEEs generally affect behaviour, we should observe faster and more accurate responses to targets which are expected dependent on the specific foreperiod (e.g., auditory instead of visual targets and vice versa). In addition, however, TBEEs might also be context-dependent. Since the auditory modality is better suited for information extraction in temporal contexts (Ball et al., 2018a, b;Bertelson & Aschersleben, 2003;Kemény & Lukács, 2019;Repp & Penel, 2002;Wilsch et al., 2020), we further hypothesised that TBEEs might be more pronounced for auditory stimuli.

Participants
We collected data from 31 participants. All participants provided written informed consent and declared to be free of neurological or psychiatric disorders and to have normal or corrected visual acuity. One participant was excluded due to low performance (performance in at least one condition < 25% correct; Ball,et al., 2018a, b for identical criteria), leaving 30 participants for analysis (mean age ± SD: 23.6 ± 4.3, # women: 18, # left-handed: 1). This study was approved by the local ethics committee of the Otto-von-Guericke-University, Magdeburg.

Apparatus
The experiments were programmed using the Psychophysics Toolbox (Brainard, 1997) and Matlab 2012b (Mathworks Inc.). Stimuli were presented on a LCD screen (22'' 120 Hz, SAMSUNG 2233RZ) with optimal timing and luminance accuracy for vision researches (Wang & Nikolić, 2011). Resolution was set to 1650 × 1080 pixels and the refresh rate to 60 Hz. Participants were seated in front of the monitor at a distance of 102 cm (eyes to fixation point). Responses were collected with a wireless mouse (Logitech M325). Accurate timing of stimuli (≤ 1 ms) was confirmed prior to the experiment with a BioSemi Active-Two EEG amplifier system connected with a microphone and photodiode.

Stimuli
Stimulus sequences on each trial consisted either of auditory (pure tones) or visual stimuli (circles filled with chequerboards). Chequerboards subtended 3.07° visual angle, were presented above the fixation cross (centre to centre distance of 2.31°) and on a dark grey background [RGB: (25.5 25.5 25.5)]. The fixation cross (white) was presented 2.9° above the screen's centre. As in our previous reports (Ball et al., 2018a, b), sounds were presented from one speaker placed on top of the screen at a distance of 7.06° from fixation, 4.76° from chequerboard's centre, and 3.22° from chequerboard's edge. Chequerboards and pure sounds were used as targets and distractors. The distractor frequencies were jittered randomly between 4.6, 4.9, and 5.2 cycles per degree for chequerboards and between 2975, 3000 and 3025 Hz for sounds. Visual and auditory target frequencies were individually adjusted to a 75% accuracy level at the beginning of the experiment. Hence, targets-although the same type of stimulus (chequerboard/pure sound)-were either lower or higher in frequency compared with distractor frequencies.
Furthermore, the intensities for both target and distractor chequerboards and sounds were varied randomly throughout the stimulus sequences. The non-white checkers were jittered between 63.75, 76.5, and 89.25 RGB (average grey value of 76.5 RGB). The sound intensities were jittered between 20%, 25%, and 30% of the maximum sound intensity [average of 25% = 52 dB(A)].

Procedure
Participants were seated in a dark, sound-attenuated chamber. Each experiment consisted of 3 parts; first, participants completed 1 or 2 training runs (32 trails per run) to familiarize themselves with the task. Next, they completed two threshold determination runs during which the frequency of the sound and chequerboard target stimuli was adjusted to 75% correct responses. Finally, participants completed 6 experimental runs (160 trials per run, 960 trials total). After completion of the experiment participant were interviewed whether they realised the foreperiod-modality contingencies. The whole procedure (including instructions etc.) took approximately 2.5-3 h per participant.
Each trial consisted of a stimulus sequence of 15 stimuli (50 ms stimulus and 100 ms gap), 4, 5 , followed by a response window (1500 ms) and an inter-stimulus-interval (350-1350 ms) in which no response was recorded. The stimulus sequence of each trial was either auditory or visual and participants were informed that on each trial a target (lower or higher frequency than distractors) was embedded in the sequence (see Fig. 2, top). Participants were asked to discriminate the frequency of each target as quickly and accurately as possible. Participants held the response device (i.e., mouse) with both hands, while placing their left/right thumbs on the left/right mouse buttons, respectively. Each button was used for one of the two response options (low/ high frequency; key bindings were counterbalanced across participants). The response recording started with the onset of the first stimulus of the sequence and ended 1500 ms after sequence's offset (response window). Only the first button press was recorded. In case no button was pressed, the trial was repeated at the end of each run's quarter (mean of repeated trials across participants: 2.2 ± 2.7% SD).
In each run, we manipulated TBEEs by altering the 'foreperiod-event' contingencies and thus, the context within each run. TBEEs imply that one event (here the target modality) is more likely after one foreperiod and the other target is more likely after the second foreperiod. These likely targets at each foreperiod will be referred to as 'primary targets'. The unexpected targets at each foreperiod will be referred to as 'secondary targets'. In 'auditory early' runs, auditory target stimuli were more likely to appear at the early position (80%), while visual stimuli were more likely to appear at the late position (80%). Hence, in auditory sequences targets appeared in 80% after a short and in 20% after a long foreperiod. In visual sequences, targets appeared in 20% after a short and in 80% after a long foreperiod. In 'visual early' runs, these likelihoods were reversed (see Fig. 2, bottom right column). Note that our previous work showed that basic TEs for early and late positions differ (Ball et al., 2018a, b); as long as the target stimulus is not presented, expectation exponentially increases that the target will soon be presented (hazard rate). Thus, accuracy is e.g. higher for the late position. To counter the hazard rate effect, we used different late positions [11th (31.25%), 13th (12.5%) or 15th (6.25%)] to render expectations for the early and late positions more alike.
Further note that the individual parameters of modality and foreperiod were always balanced within each run. In each run, 50% of foreperiods were short (early position: 3 rd position) and 50% were long (late positions). Further, 50% of sequences and thus targets were auditory and 50% were visual. Thus overall, each of these events were equally likely (see Fig. 2, bottom left and middle column). This is an important distinction to the spatial and temporal expectation literature in which probabilities of time and space are individually manipulated. In TBEE studies, these factors are balanced and only time-event contingencies are manipulated.

Data analyses
For analyses, we used Matlab 2017b (Mathworks Inc.) and JASP (v 13.0.0). To increase comparability across studies, we applied the most common outlier exclusion criteria as 4 Please note that the stimulus sequence results in entrainment. However, entrainment was identical among conditions (i.e. same stimulation frequency across trials). Further, cross-modal influences of entrainment are absent within and across trials in the present paradigm: Within trials, we only presented one modality making crossmodal influences impossible. Across trials, entrainment cannot influence the probability of events, the only factor allowing to create expectations in our experiment based on global, statistical regularities (e.g. auditory targets are more likely to appear early and visual are more likely to appear late). Please also see our previous work on temporal expectations (with a very similar paradigm) in which we discuss potential effects of entrainment for paradigms like the present one (Ball et al., 2018a(Ball et al., , b, 2021b. 5 In paradigms with stimulus sequences, participants might either use time or the ordinal position to identify the stimulus. The stimulation frequency in the present study (7.5 Hz) and in our previous studies on temporal expectations (5-15 Hz) renders stimuli virtually uncountable. Note that counting would also introduce a dual-task (first count then discriminate target's frequency) which would be highly detrimental for task performance (missing target's frequency due to counting). Further, one might argue that participants do not count explicitly, but rather the brain counts implicitly. In this case, all TE results might be based on "counting" as theories about timing networks typically imply some kind of pacemaker/oscillation/etc. (for extensive discussions on "timing vs. position", see Ball et al., 2018bBall et al., , 2021a. used in previous studies on TBEEs (Aufschnaiter et al., 2018a, b;Kunchulia et al., 2017;Thomaschke & Dreisbach, 2013Thomaschke et al., 2011;Volberg & Thomaschke, 2017;Wagener & Hoffmann, 2010): for each condition, we excluded RTs below and above three times the standard deviation around the mean and response times below 100 ms, resulting in an average exclusion of 2.3 ± 2.9% of trials per condition.
To decrease familywise error rates of the analysis and since context dependency of modality-related TBEEs was the main focus of this investigation, we calculated the difference in performance between the primary (i.e., expected, . Stimuli were presented for 50 ms with a gap of 100 ms in between successive stimuli. Target stimuli (adjusted to a threshold of 75% accuracy) are highlighted in this example with a red contour (not present in the experiment). Target stimuli were either higher or lower in frequency as compared with distractors and were presented either early (3rd position) or late (11th, 13th or 15th position) in the sequence. After the stimulus sequence ended, participants had another 1500 ms for providing their response. (Bottom) Depiction of the probabilities of modality, foreperiod and modality-foreperiod contingencies within each run (auditory or visual early). In both run types (auditory early [upper row] and visual early runs [lower row]), there was an equal probability (50%) for each modality and foreperiod to be presented on a given trial (left and middle plots). Only the modality-foreperiod contingencies (right plots) were altered (80% vs. 20%) across runs to exclusively modulate timebased expectations (and not temporal or identity expectations) 1 3 more likely) and secondary (i.e., unexpected, less likely) condition and used this difference for statistical analyses (Eimer & Kiss, 2008;Luck & Gaspelin, 2017;Näätänen et al., 2004;Sawaki et al., 2012;Thomas & Zumbo, 2012;Zimmerman et al., 1993). This difference was calculated for each foreperiod (short, long), context (auditory early [= higher likelihood for auditory early and visual late], visual early [= higher likelihood for visual early and auditory late]) and run half (first, second). We included the factor run half to test whether TBEE improves over time as participants have to re-learn the time-based regularities in each run 6 . The factor context was included as participants might learn one time-modality contingency (e.g., auditory early and visual late) but not the other. Finally, we also added the betweensubject factor first run ('auditory early', 'visual early') to account for potential interaction effects with factor context, i.e., a bias introduced by the likelihoods the participants encountered first. We conducted two repeated-measures ANOVAs, one for accuracy and one for response times, with within-subject factors foreperiod, context and run half and between-subject factor first run. If required, ANOVA results were Greenhouse-Geisser (p GG ) corrected. Further, we used directional t-tests (TBEE score second run half > first run half) in JASP for post hoc tests and Bonferroni corrected the p value (pBF).

Results
The results of the repeated measures ANOVA support the notion that modality-related TBEEs are context-dependent. Although the effect of context was non-significant (auditory early run vs. visual early run: F(1,28) = 3.681, p = 0.065, η p 2 = 0.116), it significantly interacted with the factor run half (context * run half: F(1,28) = 6.166, p = 0.019, η p 2 = 0.18). TBEE effects increased over time in 'auditory early' runs (t(29) = 2.109, pBF = 0.044) while this effect was non-significant in 'visual early' runs (t(29) = -1.487, pBF = 1, see Fig. 3A). All remaining effects were non-significant (all F < 2.596, all p > 0.118; for effects specifically including factor first run: all p > 0.184). Nevertheless, we present group-mean averages from all conditions in Fig. 3B for better comparison with the response time results (see below and Fig. 3C and D for RT results).
After the experiment, none of the participants reported to have noticed that we manipulated foreperiod-modality contingencies within and across runs. This held true, even after being informed about the experimental manipulation.
Finally, we would like to direct readers' attention to the general pattern of accuracies and response times in Fig. 3C and D. Depending on the context (auditory or visual early), accuracy-related time-based attention appears to either increase (auditory early) or decrease (visual early) over run half. Generally, expectation effects appear to be reversed in 'visual early' runs, indicating that performance was higher for the secondary condition. More importantly, even if TBEEs exist in the beginning of the 'visual early' runs (see long foreperiod), participants later (2nd run half) focus on the secondary modality at each foreperiod which is also evident for response times.

Discussion
Here we tested whether implicit time-based event-related expectations (TBEEs), expectations for a certain target stimulus (here a certain modality) contingent upon a specific temporal foreperiod, are generalizable to cross-modal contexts. We found that modality-related TBEEs were observable but context-dependent. Accuracy and response times improved over the course of a run for the primary (expected, more-likely) compared with the secondary (unexpected, less-likely) target condition but only in 'auditory early' runs (80% likelihood that auditory targets are presented early and visual targets are presented late within a sequence of 15 stimuli). In 'visual early' runs, performance rather improved over time for the secondary instead of the primary modality. Finally, response times improvements in 'auditory early' runs were mainly driven by targets appearing after a short foreperiod.
Our results show for the first time that humans are able to create TBEEs for stimuli of different modalities, thereby significantly extending previous studies restricted to the visual domain (Aufschnaiter et al., 2018a, b;Kunchulia et al., 2017;Thomaschke & Dreisbach, 2013Thomaschke et al., 2011;Volberg & Thomaschke, 2017;Wagener & Hoffmann, 2010). Moreover, previous studies were typically designed to measure response time differences (see, e.g., Volberg & Thomaschke, 2017), resulting in the use of rather 'simple' designs (i.e., the foreperiod was followed by only a single, clearly visible stimulus such as a square or circle) while accuracies were at ceiling. Here, we used a more complex paradigm in which targets were embedded among-and had thus to be distinguished from-distractors. As a result, we were able to show that TBEEs can affect accuracies -and thus, the perception of events as well as response times. In addition, we show that modality-related TBEEs do not only affect that a certain modality is expected at a certain point in time, but also that this expectation can influence performance in an orthogonal task (frequency discrimination).
The crucial finding of our study is that participants appear to have unknowingly (implicitly) learned time-modality contingencies but only when the auditory target was more likely presented early and the visual presented late as indexed by performance increases for the primary target. As we have already discussed previously (TE; Ball et al., 2018a, b), the auditory system has a higher temporal resolution as compared with the visual system, which might render the auditory system more prone to detect and utilise temporal information (see also Bertelson & Aschersleben, 2003;Kemény & Lukács, 2019;Repp & Penel, 2002;. This notion was very recently further supported by an MEG study, showing "that spatial attention has a stronger effect in the visual domain, whereas TE effects are more prominent in the auditory domain" (Wilsch et al., 2020). In line, researchers demonstrated asymmetric perceptual learning effects with spatial information being transferred from the visual to the auditory domain but temporal information being transferred from the auditory to the visual domain and not vice versa (McGovern et al., 2016).
It is possible that the higher temporal acuity of the auditory system allows for registering early auditory (as compared with visual) targets more easily and reliably. This assumption would be in line with the overall descriptive statistics presented in Fig. 3. It appears that after a short foreperiod, expectations about early auditory targets are always stronger because even in the 'visual early' runs, the secondary auditory target was expected more strongly. Additionally, expectations for early auditory targets increased B Descriptive statistics of all factor levels (foreperiod, run half and context) for differences in accuracy. C and D Same descriptive statistics as A and B for response times. Note that we plot the TBEE effect hence, the difference between the primary and secondary condition. Positive values always indicate higher performance (higher accuracy or faster response times) in the primary compared with the secondary condition. Error bars are standard errors of the mean. Bright grey bars indicate data of the first half and dark grey bars the second half of the run. Asterisks indicate significant, Bonferroni corrected post hoc results (pBF < .05) for the difference between first and second run half over time (1st vs. 2nd run half), irrespective of the auditory stimulus being the primary or secondary target after short foreperiods. Given that we found no significant influence of the first run (visual or auditory early) on performance, it is more likely that this is a general effect rather than a transition of expectations across runs (which should depend on regularities in the first run). Further, general expectations of early auditory targets appear to result in two opposite effects depending on the run type: over the time course of the run, they boost expectations about late visual targets (in 'auditory early' runs) and eliminate previously correctly established expectations about late auditory targets (in 'visual early' runs). Thus, irrespective of context, participants expected auditory targets early and visual targets late. Note again, that we used identical stimuli in both runs and only changed the foreperiod-target modality contingencies. Hence, the differential pattern observed for the two run types indicates that participant's intrinsic expectations affected performance and when matching with the existing statistical regularities, improved the behavioural outcome.
The specificity and effectiveness of time-based expectations might even be further broken down to the expected early auditory target in 'auditory early' runs. This was the only condition for which we (descriptively) show an improvement of accuracies as well as response times. For the expected (primary) visual target after the long foreperiod (within the auditory early run), only accuracies appear to be affected and TBEE-related improvements were restricted to the second half of the run. Note that this accuracy pattern suggests that participants can learn time-based event regularities for a specific context (i.e. 'auditory early and visual late'), but that they might also be prone to establish even stronger TBEE for a very specific combination of foreperiod and modality (expected auditory early target). The latter suggestion would be in line with our previous finding of stronger auditory than visual TE effects after short foreperiods (Ball et al., 2018a, b). Further, our current and previous results indicate that the use of visual information -especially related to short foreperiods -in tasks including expectations in the temporal domain (be it pure TE or TBEE tasks) appears to be somehow suppressed by the presence of the more informative auditory modality.
Perception of early expected auditory targets facilitates the creation of TBEE for early auditory targets but also for expected late visual targets within the same run. However, in 'visual early' runs, early auditory targets still appear to be prioritized, a subjective expectation that strengthens over time and even eliminates correct TBEE for the late auditory target. Thus, participants do not exhibit a bias towards the auditory modality in general (which would have facilitated auditory performance irrespective of the foreperiod); rather, any expectation at the short foreperiod (be it correct or incorrect) appears to strengthen over time and determines which event is expected at the late position. Although, our previous work (Ball et al., , 2021b indicated that explicit temporal knowledge has rather little impact on performance in temporal tasks, it is possible that explicitly attending a certain modality might affect multisensory TBEE. As mentioned in the introduction, if TBEE was affecting performance in previous studies (Lange & Röder, 2006;Mühlberg & Soto-Faraco, 2019;Mühlberg et al., 2014), their descriptive statistics suggest that it improves performance for the expected compared with the unexpected modality (irrespective of the specific modality) after short foreperiods. Given that participants were forced in these studies to actively attend to a certain modality and time point this might have minimized self-determined 'modality preferences', resulting in a stable primary vs. secondary modality and thus, context-independent TBEE effect. However, using such manipulation of attention, these studies also reduce incidental, statistical learning. At this point, future studies are required to determine potential differences between TBEEs based on different paradigms and explicit vs. implicit learning of regularities.
It is worth speculating, how our results relate to realworld behaviour. For instance, in conversations people sometimes pause. During the pause, we first focus on the auditory information (does the speaker continue). However, the longer the pause, the more we start to monitor them visually, searching for clues whether they want to continue or whether it's socially acceptable to speak. On streets, we typically listen whether a car approaches before inspecting the street visually. In talks, we expect the speaker to first present something orally before continuing to present something visually. Even evolutionary, it is reasonable to assume that auditory information often precedes visual information. Hearing is based on 360° while visual information is restricted to 180°. During night, we first listen whether e.g. a dangerous animal is approaching before directing our visual attention to the area of interest. The same is true when we are addressed by someone: first we listen and then we turn to them. In addition, our brains typically process auditory stimuli ten times faster than visual stimuli (see e.g. Starke et al., 2020). In sum, there might be social as well as evolutionary components contributing to generally expect and attend to auditory information first, followed by later visual expectations.
To close with, we would like to discuss potential commonalities between TBEEs and TEs. For instance, Mühlberg and Soto-Faraco (2019) argued that TBEE effects -unlike TE effects -do not depend on the deployment of temporal attention to specific points in time. This argument partially derived from the finding that TBEEs generalise to other time points than the most likely foreperiods (Thomaschke et al., 2011). However, if a generalisation of TBEE across a broader time window implies the absence of the guidance of temporal attention to relevant points in time, than this would also cause problems for the interpretation of TE results. The spread of TEs (and thus, the time window in which temporal attention is potentially deployed) is rarely analysed in TE studies. However, when analysed, results indicate that TEs can operate in a broader time window instead of being exclusive to one specific point in time (see e.g. Ball et al., 2018b;Bouwer & Honing, 2015;Breska & Deouell, 2016;Jaramillo & Zador, 2011). Additionally, TBEE as well as TE effects are typically explained with the 'expectation-driven guidance of temporal attention' hypothesis, which appears to be the most prominent hypothesis in both research areas (Mühlberg & Soto-Faraco, 2019;Nobre & van Ede, 2017;Thomaschke & Dreisbach, 2015;Wagener & Hoffmann, 2010). If both types of expectations involve the guidance of temporal attention, this also implies that temporal attention is often shifted implicitly which is in line with the present results and other proposals of implicit attentional shifts in the temporal, spatial and feature domain (Addleman et al., 2019;Balci & Simen, 2014;Ball et al., 2020Ball et al., , 2021bBolger et al., 2013;Melcher et al., 2005;Thomaschke & Dreisbach, 2015;Thomaschke et al., 2011). In sum, both types of expectations appear to share commonalities, with larger performance benefits due to temporal attention at the most likely points in time but also, albeit smaller effects at flanking time points (see e.g. Jones et al., 2002;Thomaschke et al., 2011).
Further, Mühlberg and Soto-Faraco (2019) argued that TBEEs are solely explained by motor preparation instead of perceptual facilitation effects, which is certainly debatable. Most TBEE studies -as specifically pointed out by the authors -were not designed to test perceptual effects but specifically aimed at affecting response times which was later linked to motor preparation effects (Thomaschke & Dreisbach, 2013;Volberg & Thomaschke, 2017). Here we show that both, accuracies and response times were boosted by TBEEs indicating perceptual facilitation in addition to response preparation (Pashler, 1989). More importantly, even studies on TEs commonly only modulate response time effects in the absence of accuracy effects (for review, see ). However, when tested whether these modulations of behaviour by TE indicate motor preparation or perceptual facilitation effects, previous modelling studies reported to be unable to distinguish between these two concepts (Jepma et al., 2012) or reported that response preparation and strategy shifts (instead of perceptual facilitation) likely caused RT modulations van den Brink et al., 2020). Together, the present and earlier results suggest that the specific study design determines whether motor preparation or perceptual facilitation is affected by both TEs and TBEEs. These findings, do not only highlight commonalities between TE and TBEE, but also highlight that results have to be interpreted cautiously when only response speed but not accuracy is modulated.
The question arises whether these commonalities between TBEEs and TE imply that both have the same underlying mechanism. It may well be that some form of time interval estimator is active in both cases and-by means of timebased expectations-gets tuned to relevant foreperiod durations. These expectations could entail that there are e.g. two equally likely foreperiods or that one specific foreperiod is most likely. Each foreperiod duration is then linked to a secondary event-related expectation (e.g., spatial position and/ or identity). Hence, probabilistic knowledge (be it explicit or implicit) about the foreperiod would in combination with event-related expectations prepare for the upcoming event directly. When the expectation matches the current presentation, performance is enhanced. Note that the expectation itself does not have to be correct (i.e., be based on the experimental paradigm) but could also be subjective. In the latter case, it is still open why exactly participants create false expectations for events (like partially present in the current study) and temporal structures.
In any case, the proposed mechanism would actually account for most findings across research fields. For instance, tying expectations about more likely foreperiods to certain modalities, would explain the asymmetric modalityspecific TE effects we found in our previous studies (Ball et al., 2018a(Ball et al., , b, 2021b. Always, expecting auditory stimuli early and visual late, as in the present experiment would improve primary target performance in auditory early runs (correct expectation) and secondary target performance in visual late runs (false expectation). Additionally, if e.g. the target identity (e.g., always visual) and spatial position are fixed, any match or mismatch in expectation would only be related to the temporal domain (higher performance in expected compared to unexpected temporal trials). Note that target matches in multiple dimensions could result in interactions of combined expectation types but do not necessarily have to. Finally, some expectations might be more crucial to the task at hand. For instance, if participant's attention is drawn away from a target position (due to spatial expectation), visual stimuli are strongly degraded at the unexpected position, rendering temporal attention irrelevant for task performance (in line with findings in spatio-temporal expectation studies; see He et al., 2020;Rohenkohl et al., 2014). In sum, participants might always create some form of combined expectations (e.g., spatial and temporal); however, depending on each expectation type's task relevance, only one might influence behaviour.

Conclusion
Our results strongly suggest that TBEEs extend to crossmodal contexts, but depend on the creation of event-specific temporal expectations, thereby enhancing performance. This performance enhancement can be linked to perceptual in addition to motor facilitation. Most importantly, our results indicate that TBEEs can be context-dependent and seem to be driven by auditory information, if available, especially after short foreperiods. This has crucial implications for multisensory studies of TEs as participants might develop TBEEs instead of TEs if modalities 'compete for temporal expectation and attention' within one experiment. More importantly, the competition which appears to be driven by targets presented after short foreperiods results in modality-specific TBEEs (i.e., auditory early) that only improve behaviour cross-modally when it matches with the global context (i.e., statistical regularities) within one run (i.e., auditory early/visual late).
Author contributions FB: conceptualization, methodology, formal analysis, investigation, writing-original draft, writing-review and editing, visualization, supervision, project administration, funding acquisition JA: investigation, visualization, writing-original draft, writing-review and editing TN: conceptualization, methodology, writing-original draft, writing-review and editing, supervision, funding acquisition.
Funding Open Access funding enabled and organized by Projekt DEAL. This work was funded by the European Funds for Regional Development (EFRE), ZS/2016/04/78113, Center for Behavioral Brain Sciences -CBBS.

Data availability statement
The authors declare that the analysed data as well as potential supplementary analyses (not reported in the main manuscript) are available in the supplementary information files.
Code availability Data were analysed with JASP (freely available) and Matlab 2017b (Mathworks Inc.). Any relevant code is available upon request.

Conflict of interest
The authors declare no competing financial and non-financial interests, or other interests that might be perceived to influence the results and/or discussion reported in this paper.
Ethical approval This study was approved by the local ethics committee of the Otto-von-Guericke-University, Magdeburg.
Consent to participate and publish Informed consent was obtained from all individual participants included in the study.

Open Practices Statement
The data and materials are available in the supplementary information files and upon request (e.g., codes). None of the experiments were preregistered.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.