Introduction

Our bodies receive various types of information simultaneously, and most of this information is obtained from multisensory modalities. Two main behavioral outcomes of multisensory integration have been identified. One of the two outcomes is the redundancy effect, which involves the merging of information across senses. However, in some cases, one sensory modality from multisensory information also competes for preference for consciousness, which involves the sensory dominance effect. Specifically, one modality dominates the others to receive preferential processing and ultimately dominates the awareness and behavior of observers (Chen & Zhou, 2013; Driver & Noesselt, 2008; Hirst et al., 2018; Huang et al., 2015; Koelewijn et al., 2010; Spence, 2011; Stein & Meredith, 1993; Talsma et al., 2010).

In most cases involving spatial information, visual information more frequently dominates other sensory modalities. However, especially in regard to temporal information, auditory information also dominates other sensory modalities (Bresciani et al., 2008; Repp, 2000, 2002; Wang et al., 2019). Sound-induced flash illusion (SiFI) is a typical phenomenon of information competition defined by Shams et al. (2000). When a certain number of visual flashes are presented with an unequal number of auditory beeps that are either presented successively or simultaneously within 100 ms, observers will misperceive the number of visual flashes due to the successive or simultaneous presentation of different numbers of auditory beeps (Hirst et al., 2020; Keil, 2020). Auditory information receives preferential processing over visual information (Shams et al., 2000, 2002). SiFI has been divided into fission and fusion illusions (Andersen et al., 2004; Shams et al., 2000, 2002). The fission illusion occurs when two visual flashes are perceived if a single visual flash is presented accompanied by two auditory stimuli (Shams et al., 2000, 2002); the fusion illusion occurs when one flash is perceived if two flashes are presented with one auditory stimulus (Andersen et al., 2004). Some studies have suggested that a greater SiFI effect corresponds to a lower visual perceptual sensitivity, which is calculated by d’, and looser subjective judgment criteria, which are calculated by β (Keil, 2020; Kumpik et al., 2014; McCormick & Mamassian, 2008).

Existing studies have shown that SiFI perception varies across individuals (Kaiser et al., 2019; Keil & Senkowski, 2018), ranging from 3% to 86% (Mishra et al., 2007). Studies have also suggested that neural oscillations orchestrate the individual perception of the SiFI effect (Kaiser et al., 2019; Keil & Senkowski, 2018; Lange et al., 2014). Moreover, some studies have illustrated that perceptual processing changes significantly with aging (Cabeza et al., 2004; Grady, 2010; Salat et al., 2009), which involves multisensory perception. Older adults display increased multisensory integration compared to younger adults (Hirst et al., 2020). Setti et al. (2011) first indicated a greater effect of beeps on older adults judging the number of visual flashes than on younger adults, and subsequent studies have confirmed the results (DeLoss et al., 2013). In addition, older adults are susceptible to SiFI (Hernández et al., 2019), especially fission illusions, across a much wider range of temporal asynchronies in SiFI (McGovern et al., 2014; Setti et al., 2011), presumably due to older adults’ enlarged temporal window of integration (TWI) (Diederich et al., 2008; Laurienti et al., 2006; Peiffer et al., 2007; Wu et al., 2012). However, one recent study indicated a lower perceptual sensitivity of auditory stimuli in older adults; consequently, they were more dependent on visual stimuli (Sun et al., 2020). At the same time, age-related differences in integration effects have been reported to be due to sensory modality and measurements. Mahoney et al. (2011) found that younger adults but not older adults exhibited a significant increase in the magnitude of coactivation for auditory-somatosensory (AS) and auditory-visual (AV) pairings (Mahoney et al., 2011), whereas older adults presented a significantly greater reaction time (RT) benefit when processing concurrent visual-somatosensory (VS) coactivation using a race model. In general, the evidence for audiovisual integration in older adults is controversial (Chan et al., 2018; DeLoss et al., 2013; DeLoss & Andersen, 2015; McGovern et al., 2014; Sun et al., 2020).

SiFI has been revealed to be robust, irrespective of a variety of experimental manipulations, such as the interstimulus time interval (beep SOA (stimulus onset asynchrony) and flash SOA) (Shams et al., 2000, 2002; Shams et al., 2005), stimulus properties (Andersen et al., 2004; Setti & Chan, 2011), spatial location (Abadi & Murphy, 2014), feedback mode (Rosenthal et al., 2009), and the amount of attention allocated (Mishra et al., 2008, 2010). However, all studies on SiFI have been conducted in two-dimensional (2D) space, while we live in a real 3D world with depth. Therefore, a critical factor that has been ignored in current studies is the "depth" in the real environment. Another key factor that has been ignored is the spatial distance from the perceived sensory information to the observers along the depth dimension of the 3D world, which would help observers evaluate a potentially threatening/rewarding stimulus. In real life, the stimuli that appear close might capture observers’ attention to ensure survival; conversely, stimuli far from observers may require less attention (Graziano & Cooke, 2006; Previc, 1998). More importantly, although studies have found that SiFI auditory dominance is not affected when visual and auditory stimuli are presented in the same or different spatial locations in 2D space (Abadi & Murphy, 2014; DeLoss & Andersen, 2015; Innes-Brown & Crewther, 2009), researchers do not clearly understand whether sensory dominance changes with the depth of sensory input.

Early evidence indicated that attention resources decrease with the distance from the observers’ position in 3D space, with the maximal attentional focus in the visual space around the observers’ body (Maringelli et al., 2001). Therefore, fewer attentional resources are allocated to far space than to near space. When attentional resources are limited, the competitive advantage of the dominant modality may increase, leading to a higher cost of the non-dominant modality (Desimone et al., 1990; Desimone & Duncan, 1995). As Michail and Keil (2018) recently reported, participants’ SiFI perception increases when attentional resources are limited. Therefore, because SiFI is an illusion effect in which audition dominates vision, if far-away stimuli are allocated fewer attentional resources than near stimuli, a plausible hypothesis is that auditory stimuli in far space will obtain more competitive advantages than vision; thus, the frequency of the SiFI will increase compared to near-space auditory stimuli combined with far-space visual stimuli. Meanwhile, according to the perception/action model (Goodale & Milner, 1992), the representation of objects in the near space is achieved through movement, which activates the dorsal visual pathway, while the representation of objects in the far space is achieved through perception, which activates the ventral visual pathway. Thus, sensory dominance may also be influenced by 3D space, as multisensory gain (tendency to integrate stimuli) is greater when dominant stimuli are presented in the far space compared to the near space (Noel et al., 2018; Previc, 1998; Van der Stoep et al., 2016a, b). Recent research has discovered sensory dominance changes along the depth dimension in the 3D world, as reported in a study on the Colavita visual dominance effect (Yue et al., 2015). The authors found that the magnitude of the visual dominance effect was significantly increased as long as the visual stimuli were presented in far space, irrespective of whether the auditory stimuli were presented in near or far space. However, researchers have not determined how auditory stimuli in far space affect visual flashes in near space when attention is focused on near space and how auditory stimuli in near space affect visual flashes in far space when attention is focused on far space.

Based on these results, in the present study, by presenting visual and audiovisual stimuli either in near or in far space, we hypothesized that the dominance of audition over vision would change as a function of the spatial location, i.e., attention resources in far space would be less than those in near space, the competitive advantage of auditory modality may increase, and the visual modality would have more difficulty capturing attention resources, which would result in a larger auditory advantage in far space than in near space. In the experiment, visual unimodal stimuli were presented in far or near space, and audiovisual bimodal stimuli were always presented at different distances. If the SiFI auditory dominance effect was affected by spatial dominance in the depth dimension, then we would find that when the auditory stimuli were presented in far space and the visual stimuli were in near space, observers would be more susceptible to SiFI. Moreover, an age-related decline in localized visual and auditory stimuli (Abel et al., 2000; Andersen & Atchley, 1995; Norman et al., 2000) seemed to predict the difference in performance compared to younger adults in this study. Studies have reported older adults’ degradation in vertical and front-back discrimination of auditory space (Abel et al., 2000; Dobreva et al., 2011; Noble et al., 1994) and the depth of visual spatial attention may be somewhat less for older adults (Pierce et al., 2011). Overall, we assumed that when the auditory stimuli were presented in far space, increased auditory dominance would be observed compared to near space, namely, more SiFI. Additionally, due to the decline in older adults’ visual and auditory spatial localization abilities, older adults might be less susceptible to spatial location factors.

Materials and methods

Participants

Thirty-nine older adults and 29 younger adults participated in the present experiment. If the accuracy of any of the non-illusion conditions (F1, F2, F1B1, and F2B2) was less than 60%, the participant was considered to be unable to identify the stimulus properly, and the data from the participant were rejected. Based on this criterion, 14 older adults and five younger adults were excluded from the experiment. Therefore, ultimately, 25 older adults (11 males and 14 females) aged 60–76 years (mean age = 64 years) and 24 college students (eight males and 16 females) aged 18–26 years (mean age = 22 years) were included in the analysis. All participants were naive to the experimental procedure and were paid for their participation in the experiment. All participants were rescreened for self-reported eye disease, neurological disorders (e.g., Alzheimer’s disease, Parkinson’s disease, or stroke), and any significant hearing loss. All participants had normal or corrected-to-normal visual acuity. All participants provided written informed consent according to the standards of the Declaration of Helsinki. The study was approved by the Ethics Committee of the Department of Psychology, Soochow University.

We used G*Power 3.1.9.2 to conduct a sensitivity analysis of an F test and to evaluate the statistical power in the present study (Faul et al., 2007, 2009). The input parameters were ɑ err prob = 0.05, power (1−β err prob) = 0.80, total sample size = 49, number of groups = 2, and number of measurements = 2, and the output was an effect size f = 0.29, which indicated sufficient power to detect moderately sized effects (Cohen, 1988). We used SPSS 226.0 to perform ANOVA and t tests for accuracy and RTs. Additionally, we also conducted Bayesian tests (Wetzels & Wagenmakers, 2012) using JASP 0.16.

Stimuli and apparatus

As shown in Fig. 1, all stimuli were presented at two distances from participants: 50 cm for the near condition and 382 cm for the far condition. In near space, all visual stimuli were presented on a ThinkPad X390 laptop computer with a screen resolution of 1,024 × 768 pixels and a refresh rate of 60 Hz. The visual stimuli were white discs (with a radius of view of 2°) presented at a 5° viewing angle below the central fixation point for 17 ms by Presentation software (Neurobehavioral Systems Inc.). The visual stimuli were presented at a 5° angle of view below the central fixation point because the visual stimuli had the greatest illusion in the peripheral field of view when presented with the accompanying auditory stimuli (Shams et al., 2002). The auditory target was a pure tone (3.5 kHz, 75 dB, for 7 ms, from Shams et al., 2002) that was presented at 50 cm via two loudspeakers on either side of the monitor. In far space, visual stimuli were presented on the screen by a projector (EPSON CB-X06E), while speakers were placed on either side of the monitor, both of which were located 382 cm away from the participants. The angle of view (2°) and luminance (9.5 cd/m2) were the same in both near and far spaces. Similarly, the loudness of the auditory stimuli (75 dB) in the near and far space was matched (the decibel meter was placed at the participants’ head position).

Fig. 1
figure 1

Flow chart of each trial. The left side of the figure shows the time flow chart of F1, F2, F1B1, F2B2, F1B2, and F2B1, and the right side shows the diagram of the experimental setup (using F1B2 as an example). The behavioral targets included a 7-ms auditory pure tone (blue lines) and a 17-ms visual circle (red lines). Two types of stimuli were used: unimodal visual stimuli (F1 and F2) in which only the visual target was presented and bimodal stimuli in which the auditory and visual targets were simultaneously presented (F1, F2, F1B1, F1B2, F2B1, and F2B2). The visual and auditory stimuli were presented at different spatial distances (50 cm for the near condition and 382 cm for the far condition). Two types of spatial locations were used: Vfar_Anear refers to visual stimuli presented in near space and auditory stimuli presented in far space, and Vfar_Anear refers to visual stimuli presented in far space and auditory stimuli presented in near space

Experimental design and procedure

The experiment was a 2 (participant group: older vs. younger adults)×2 (spatial location: Vfar_Anear vs. Vnear_Afar)×2 (illusion type: fission illusion vs. fusion illusion) mixed design, the participant group was the between-group variable, and others were considered within-group variables. For the sake of discussion, trials are named using this unified expression. For example, F = flash, B = beep, F2 means only two visual stimuli were presented, and F2B1 means that two visual stimuli were accompanied by one auditory stimulus. We were more interested in the two types of illusions (fission and fusion illusions); therefore, in our comprehensive analysis, we only focused on the F1B2 and F2B1 conditions.

At the beginning of the experiment, participants were required to pass a test to determine whether they understood the task and were able to discriminate the beeps or flashes in isolation. The formal experimental procedure is shown in Fig. 1. Stimuli were presented 50 cm (near) or 382 cm (far) away from the monitor and the loudspeakers in a dimly lit room. Participants were instructed to place their chin on a chinrest and maintain central fixation throughout the experiment. Two types of stimuli were used: unimodal visual stimuli (F1 and F2) and bimodal audiovisual stimuli (F1B1, F1B2, F2B1, and F2B2). In each trial, one or two visual target stimuli (the duration was 17 ms) accompanied by no, one, or two auditory stimuli (the duration was 7 ms) were presented. The formal experiment consisted of two sections, including the Vnear_Afar condition (visual stimuli were presented 50 cm (near) and auditory stimuli were presented 382 cm (far) away from the monitor and the loudspeakers) and the Vfar_Anear condition (visual stimuli were presented 382 cm (far) and auditory stimuli were presented 50 cm (near) away from the monitor and the loudspeakers). The participants’ task was to determine if they perceived one or two visual stimuli by pressing the left or right mouse button within 1,500 ms after the stimuli were presented. We balanced the button response, with half of the participants pressing the left button and half pressing the right button if they perceived one flash. Each participant was required to complete 384 trials (96 trials per block, four blocks in total) in each space condition, 64 trials under each experimental condition, and the intertrial interval (ITI) between trials was randomized from 400 ms to 700 ms in steps of 100 ms. Half of the participants performed the Vnear_Afar condition first, and the other half of the participants performed the opposite condition first.

Results

Accuracy

For conditions in which the visual flash was presented alone, the accuracy of F1 and F2 were greater than 80% in both participant groups and spatial locations (see Table 1); these results were significantly larger than a random level 50% (ps < 0.001), which indicated that all participants had the ability to correctly identify the number of flashes. We conducted a 2 (participant group: older vs. younger adults) × 2 (spatial location: Vfar_Anear vs. Vnear_Afar) × 4 (condition: F1B1 vs. F1B2 vs. F2B1 vs. F2B2) repeated-measures ANOVA to further analyze the participants' performance. The main effect of the participant group was significant, F (1, 47) = 6.83, p = 0.012, ηp2 = 0.13, BF10 = 39.45, indicating that the accuracy of younger adults (79%) was significantly greater than that of older adults (70%). The main effect of spatial location was significant, F (1, 47) = 4.32, p = 0.043, ηp2 = 0.08, BF10 = 0.15, indicating that the accuracy of Vfar_Anear (76%) was significantly greater than that of Vnear_Afar (73%). The main effect of the condition was significant, F (3, 141) = 81.68, p < 0.001, ηp2 = 0.64, BF10 = 3.249e+13. Post hoc tests showed that the accuracy in the F1B1 condition (95%) was significantly higher than the accuracy in the F1B2 condition (45%), and the accuracy in the F2B2 condition (95%) was significantly higher than the accuracy in the F2B1 condition (61%), indicating that the auditory dominance effect occurred, namely, the fission and the fusion illusion. The interaction between the participant group and spatial location was significant, F (1, 47) = 8.70, p = 0.005, ηp2 = 0.16, BF10 = 0.30. The interaction between the participant group and condition was significant, F (3, 141) = 3.00, p = 0.033, ηp2 = 0.06, BF10 = 55.59. The interaction between the spatial location and condition was significant, F (3, 141) = 3.64, p = 0.014, ηp2 = 0.07, BF10 = 0.06. The interaction among the spatial location, condition, and participant group was significant, F (3, 141) = 5.90, p < 0.001, ηp2 = 0.11, BF10 = 0.07. Although some interactions of the Bayesian factors produced weak evidence in support of the null hypothesis, the appearance of the fission and fusion illusion was supported by the Bayesian factors.

Table 1 Mean accuracy of older and younger adults under illusion conditions (M ± SD)

We performed additional analyses to determine how the fission and fusion illusions were affected by the participant group and the spatial location, respectively. For the F1B2 condition (Fig. 2 left), we conducted a 2 (participant group: older vs. younger adults) × 2 (spatial location: Vfar_Anear vs. Vnear_Afar) repeated-measures ANOVA. The main effect of the participant group was not significant, F < 1, BF10 = 2.63. The main effect of the spatial location was significant, F (1, 47) = 5.89, p = 0.019, ηp2 = 0.11, BF10 = 4.98, indicating that the accuracy of Vfar_Anear (49%) was significantly greater than that of Vnear_Afar (41%). The interaction between the participant group and condition was significant, F (1, 47) = 9.50, p = 0.003, ηp2 = 0.17, BF10 = 9.14. A simple analysis of the interaction effects was performed. No significant effects were observed in the Vfar_Anear, t (47) = 2.02, p = 0.29, BF10 = 1.44, and Vnear_Afar conditions, t < 1, BF10 = 0.31. For the older adults, no significant effect was observed, t < 1, BF10 = 0.24. However, for the younger adults, the accuracy of Vfar_Anear (59%) was significantly greater than that of Vnear_Afar (39%), t (23) = 3.86, p = 0.002, Cohen's d = 0.71, CI = [0.08, 0.31], BF10 = 19.12. The results suggested that for older adults, the fission illusion was not affected by the spatial location. For younger adults, the fission illusion was affected by the spatial location, and the fission illusion was much greater in the Vnear_Afar condition than in the Vfar_Anear condition.

Fig. 2
figure 2

Mean accuracy of the younger and older adults for the fission (left data) and fusion (right data) illusion conditions under the Vfar_Anear and Vfar_Anear conditions. Two types of illusions were shown: fission illusion (F1B2) and fusion illusion (F2B1). F1B2 refers to a visual stimulus accompanied by two auditory stimuli. F2B1 refers to two visual stimuli accompanied by an auditory stimulus. The conditions Vfar_Anear and Vnear_Afar are the same as those described in Fig. 1

For the F2B1 condition (Fig. 2 right), we conducted a 2 (participant group: older vs. younger adults) × 2 (spatial location: Vfar_Anear vs. Vnear_Afar) repeated-measures ANOVA. The main effect of the participant group was significant, F (1, 47) = 7.92, p = 0.007, ηp2 = 0.14, BF10 = 4.35, indicating that the accuracy of younger adults (73%) was significantly greater than that of older adults (51%). The main effect of spatial location was not significant, F < 1, BF10 = 0.25. The interaction between the participant group and condition was not significant, F (1, 47) = 2.23, p = 0.142, ηp2 = 0.05, BF10 = 0.47. Based on these results, the fusion illusion was not affected by the spatial location, and the fusion illusion was much greater in older adults than in younger adults.

Signal detection theory analysis

We next determined whether the older and younger adults’ different magnitudes of fission and fusion illusion at different spatial locations were related to changes in flash discriminability and/or criteria for reporting the number of flashes caused by beeps. The data were analyzed according to signal detection theory (Chen et al., 2017; Vanes et al., 2016). Conditions were divided into fission conditions (F1B2 and F2B2) and fusion conditions (F1B1 and F2B1). For fission conditions, two flashes were defined as the signal; therefore, in F2B2, the reaction of "2 flash" was a hit, while the reaction of "1 flash" was a miss. In contrast, in F1B2, the "2 flash" response was a false alarm (FA), and the "1 flash" response was a correct rejection. For fusion conditions, one flash was defined as the signal; therefore, in F1B1, the reaction of "1 flash" was a hit, while the reaction of "2 flash" was a miss. In contrast, in F2B1, the "1 flash" response was a false alarm (FA), and the "2 flash" response was a correct rejection. The parameter ď represents the discriminability of whether one or two flashes were presented, and ln(β) represents the response criterion or tendency, which was calculated using the following equations (Macmillan & Creelman, 2004; Rosenthal et al., 2009):

$$ {d}^{\prime }=\mathrm{z}\left(\mathrm{Hit}\ \mathrm{rate}\right)-\mathrm{z}\left(\mathrm{FA}\right) $$
$$ \ln \left(\upbeta \right)=\frac{\mathrm{z}{\left(\mathrm{FA}\right)}^2-\mathrm{z}{\left(\mathrm{Hit}\ \mathrm{rate}\right)}^2}{2} $$

For p = 0 and p = 1 event considerations, log-linear transformations were applied to calculate the hit and false alarm rates (adding 0.5 each to the number of hits and false alarms and adding 1 each to the total number of signal trials and no-signal trials) to avoid extreme d’ values and ln(β) (e.g., Vanes et al., 2016).

Sensitivity (d’)

We analyzed d’ in the illusion condition (F1B2 and F2B1) to supplement the accuracy results. For the F1B2 condition (Fig. 3, left panel), we conducted a 2 (participant group: older vs. younger adults) × 2 (spatial location: Vfar_Anear vs. Vnear_Afar) repeated-measures ANOVA. The main effect of the participant group was significant, F (1, 47) = 5.05, p = 0.029, ηp2 = 0.10, BF10 = 3.18, indicating that the d’ of younger adults (1.91) was significantly greater than that of older adults (1.36). The main effect of the spatial location was significant, F (1, 47) = 4.32, p = 0.043, ηp2 = 0.08, BF10 = 1.78, indicating that the d’ of Vfar_Anear (1.77) was significantly greater than that of Vnear_Afar (1.50). The interaction between the participant group and condition was significant, F (1, 47) = 5.07, p = 0.029, ηp2 = 0.10, BF10 = 2.86. A simple effect analysis was conducted. In the Vfar_Anear condition, the d’ of younger adults (2.19) was significantly greater than that of older adults (1.35), t (46) = 3.04, p = 0.020, Cohen's d = 0.83, CI = [0.09, 1.58], BF10 = 7.53. In the Vnear_Afar condition, a significant effect was not observed, t < 1, BF10 = 0.43. For the older adults, no significant effect was observed, t < 1, BF10 = 0.33. However, for the younger adults, the d’ of Vfar_Anear (2.19) was significantly greater than that of Vnear_Afar (1.63), t (23) = 3.03, p = 0.024, Cohen's d = 0.56, CI = [0.12, 0.99], BF10 = 4.31. The results suggested that older adults' judgment of flashes was not affected by the spatial location in the fission illusion condition. For younger adults, the participants' judgment of flashes was affected by the spatial location, and the fission illusion was much greater under the Vnear_Afar condition than under the Vfar_Anear condition. In addition, the fission illusion of older adults was greater than that of the younger adults under the Vfar_Anear condition, but no difference was observed under the Vnear_Afar condition.

Fig. 3
figure 3

Mean d’ under illusion conditions for older (left data) and younger (right data) adults. Two types of illusions were shown: fission illusion (F1B2) and fusion illusion (F2B1). The conditions Vfar_Anear and Vnear_Afar are the same as those described in Fig. 1

For the F2B1 condition (Fig. 3, right panel), we conducted a 2 (participant group: older vs. younger adults) × 2 (spatial location: Vfar_Anear vs. Vnear_Afar) repeated-measures ANOVA. The main effect of the participant group was significant, F (1, 47) = 12.35, p = 0.001, ηp2 = 0.21, BF10 = 21.65, indicating that the d’ of younger adults (2.60) was significantly greater than that of older adults (1.70). The main effect of the spatial location was not significant, F < 1, BF10 = 0.26. The interaction between the participant group and condition was not significant, F (1, 47) = 1.48, p = 0.23, ηp2 = 0.03, BF10 = 0.41. Based on these results, the participants' judgment of flashes was not affected by the spatial location in the fusion illusion condition, and the fusion illusion was much greater in older adults than younger adults.

Criterion ln(β)

We analyzed the criterion ln(β) according to the illusion condition (F1B2 and F2B1). For the F1B2 condition, we conducted a 2 (participant group: older vs. younger adults) × 2 (spatial location: Vfar_Anear vs. Vnear_Afar) repeated-measures ANOVA. The main effect of the participant group was not significant, F (1, 47) = 2.84, p = 0.098, ηp2 = 0.06, BF10 = 0.74. The main effect of the spatial location was not significant, F < 1, BF10 = 0.24. The interaction between the participant group and condition was not significant, F (1, 47) = 2.89, p = 0.096, ηp2 = 0.06, BF10 = 0.34. For the F2B1 condition, we conducted a 2 (participant group: older vs. younger adults) × 2 (spatial location: Vfar_Anear vs. Vnear_Afar) repeated-measures ANOVA. The main effect of the participant group was not significant, F < 1, BF10 = 0.27. The main effect of the spatial location was not significant, F < 1, BF10 = 0.18. The interaction between the participant group and condition was not significant, F < 1, BF10 = 0.08. Thus, all participants' response criterion ln(β) was consistent.

Reaction times

We analyzed the RT of the correct trials under the illusion condition (F1B2 and F2B1). For the F1B2 condition (Fig. 4, left panel), we conducted a 2 (participant group: older vs. younger adults) × 2 (spatial location: Vfar_Anear vs. Vnear_Afar) repeated-measures ANOVA. The main effect of the participant group was significant, F (1, 47) = 11.43, p = 0.001, ηp2 = 0.20, BF10 = 15.58, indicating that the RT of older adults (824 ms) was significantly longer than that of younger adults (732 ms). The main effect of the spatial location was not significant, F < 1, BF10 = 0.26. The interaction between the participant group and condition was not significant, F (1, 47) = 1.38, p = 0.25, ηp2 = 0.03, BF10 = 0.38. The results suggested that older adults took longer to make correct judgments than younger adults under the fission illusion condition.

Fig. 4
figure 4

Mean reaction times of older (left data) and younger (right data) adults under illusion conditions. Two types of illusions were showed: fission illusion (F1B2) and fusion illusion (F2B1). The conditions Vfar_Anear and Vnear_Afar are the same as those described in Fig. 1

For the F2B1 condition (Fig. 4, right panel), we conducted a 2 (participant group: older vs. younger adults) × 2 (spatial location: Vfar_Anear vs. Vnear_Afar) repeated-measures ANOVA. The main effect of the participant group was significant, F (1, 47) = 12.69, p = 0.001, ηp2 = 0.21, BF10 = 69.04, indicating that the RT of older adults (673 ms) was significantly longer than that of younger adults (595 ms). The main effect of the spatial location was not significant, F (1, 47) = 1.33, p = 0.25, ηp2 = 0.03, BF10 = 1.78. The interaction between the participant group and condition was significant, F (1, 47) = 8.48, p = 0.005, ηp2 = 0.15, BF10 = 7.05. A simple effect analysis was conducted. Under the Vfar_Anear condition, the RT of older adults (683 ms) was significantly longer than that of younger adults (572 ms), t (46) = 4.50, p < 0.001, Cohen's d = 1.23, CI = [43.81, 117.15], BF10 = 245.19, and under the Vnear_Afar condition, no significant effect was observed, t (46) = 1.84, p = 0.422, BF10 = 1.29. For the older adults, no significant effect was observed, t (24) = 1.10, p = 0.28, BF10 = 0.36. However, for the younger adults, the RT of Vnear_Afar (618 ms) was significantly greater than that of Vfar_Anear (572 ms), t (23) = 2.85, p = 0.039, Cohen's d = 0.71, CI = [18.37, 72.88], BF10 = 18.29. Therefore, under the fusion illusion condition, older adults spent more time than younger adults only in the Vfar_Anear condition, but not the Vnear_Afar condition. In addition, the response time of older adults was not affected by the spatial location, while the response time of younger adults was longer in Vnear_Afar than in Vfar_Anear.

Discussion

The present study adopted the classic SiFI paradigm (Shams et al., 2000, 2002) to investigate how auditory stimuli presented in far space affected visual flashes presented in near space when attention was focused on the near space, how auditory stimuli presented in near space affected visual flashes presented in far space when attention was focused on the far space, and how the effect interacted with aging when audiovisual bimodal stimuli were presented in either near or in far space. Although both groups showed the same tendency to respond, older adults were more susceptible (lower accuracy and d’) to the fusion illusion than younger adults. Moreover, the magnitude of the fission illusion was enhanced when auditory stimuli were presented in far versus near space to younger adults but not among older adults (Fig. 2). In terms of the RT results, older adults consistently showed a slower response to the fission illusion than younger adults. However, older adults were slower to respond to the fusion illusion than younger adults only under the Vfar_Anear condition. Younger adults were slower to respond to Vnear_Afar than to Vfar_Anear (Fig. 3). Therefore, the SiFI auditory dominance of younger adults was enhanced under the Vnear_Afar condition, while the older adults’ SiFI was not affected by the spatial location.

The classic SiFI effect (Shams et al., 2000) has been documented in both older and younger adults in both the present study and previous studies. Notably, older adults have shown a larger fusion illusion and longer RTs than younger adults in the fission illusion, consistent with previous studies reporting that older adults were more susceptible to the SiFI, showing a larger magnitude of the illusion (Deloss et al., 2013; Mcgovern et al., 2014; Setti et al., 2011). Lower perceptual sensitivity (lower d’) might be the cause of a greater SiFI and could therefore be a measure of susceptibility to illusions (Kumpik et al., 2014; McCormick & Mamassian, 2008). Therefore, the results indicated that older adults’ perceptual sensitivity to visual stimuli was lower under the fusion illusion condition, resulting in older adults being more dependent on auditory stimuli and experiencing a greater SiFI. However, the present study did not reveal differences in fission illusion conditions between older and younger adults, which was inconsistent with previous studies (Deloss et al., 2013; Setti et al., 2011) and may be due to the effect of spatial location on the fission illusion.

According to the accuracy results, auditory information was more dominant in Vnear_Afar than in Vfar_Anear for younger adults in F1B2; the signal detection theory analysis also indicated that younger adults had a lower d’ when the auditory stimuli were presented in far space, suggesting that younger adults were more sensitive to the fission illusion. As shown in previous studies, attentional resources are mostly centered around the observer (Downing & Pinker, 1985; Gawryszewski et al., 1987; Maringelli et al., 2001); therefore, fewer attentional resources are allocated to far space than to near space. Attentional resources are shared across sensory modalities (Klemen et al., 2009; Sinnett et al., 2007); as a result, while attention resources are limited, the competitive advantage of the dominant modality potentially increases (Desimone et al., 1990; Desimone & Duncan, 1995). Sinnett et al. (2007) found that the magnitude of the visual dominance effect decreased when available attentional resources were manipulated directly by reducing the perceptual load in the auditory modality; the magnitude of the visual dominance effect was also reduced when the direction of attention was biased to the auditory modality (Sinnett et al., 2007), indicating that the overall attentional resources are limited and stimuli from visual and auditory modalities competed for attentional resources. Taken together, regarding the SiFI, overall attentional resources decreased in far space compared to near space, resulting in the competitive advantages of the auditory stimuli increasing in far space compared to near space and a greater SiFI under the Vnear_Afar condition. Yue et al. (2015) found that the visual dominance effect (Colavita effect) increased when visual stimuli were presented in far space. Our study further clarifies that the increase in the illusion effect on far space reported by Yue et al. (2015) is not due to the visual stimuli being presented in far space but rather to the dominant sensory stimuli.

Furthermore, different neuroanatomical systems seem to be involved in the processing of near and far space. The perception/action model of the ventral/dorsal stream proposes that the ventral stream transforms visual or auditory information to a perception, which plays a crucial role in object recognition; the dorsal stream, on the other hand, processes visual or auditory information to prepare the observer to execute actions (Goodale et al., 1991; Goodale & Milner, 1992; Rauschecker, 2012; Shmuelof & Zohary, 2005; Vuilleumier et al., 1998; Wise, 2003). This phenomenon might explain the differences in the dorsal and ventral streams, supporting near and far space processing. Since a direct action cannot be performed in far space, the ventral stream is activated; in contrast, the dorsal stream subserves near-space processing because objects can be directly manipulated in near space (Mennemeier et al., 1992; Previc, 1990, 1998; Romanski et al., 2000; Weiss et al., 2003). Thus, the evidence suggests sensory information processing patterns of ventral/perception/far space and dorsal/movement/near space. Visual and auditory information processing occur along both dorsal/where and ventral/what pathways (Belin & Zatorre, 2000; Mishkin et al., 1983). In addition, studies have shown that in 2D experimental environments, the dorsal visual stream exhibited significantly increased neural activity when vision dominated audition, which indicates that the ventral/dorsal pathway was related to multisensory competition (Huang et al., 2015). Therefore, we suspect that in regard to the auditory dominant SiFI effect, when auditory stimuli are presented in far space, the neural activity of the auditory ventral stream that is mainly responsible for processing perception/far space would increase. As a result, the competitive advantages of the auditory stimuli were strengthened in far space, and the magnitude of the SiFI became larger. The present results of the fission illusion support this hypothesis.

For older adults, the fission illusion was not affected by the "depth" factor. One possible explanation was that older adults had increased multisensory integration, which helped counteract the age-related decline in sensory processing according to the compensation mechanism (Hairston et al., 2003; Reuter-Lorenz & Cappell, 2008). Multisensory illusion of older adults was so strong that the effect of spatial location on SiFI was ignored. Another possibility was that the ability of age-related spatial perception declines (Andersen & Atchley, 1995; He et al., 1998; Owsley, 2011). In particular, older adults’ decreased ability to localize visual and auditory stimuli in the environment might result in the lack of a significant difference in SiFI between the Vfar_Anear and the Vnear_Afar conditions for older adults. Atchley and Kramer (2000) conducted an experiment that examined the extent to which age influenced 3D spatial attention. Depth had no effect on older adults, potentially because older adults fail to tightly focus their attention along the depth axis. A subsequent study added to the evidence that the breadth of spatial attention of older drivers is constant across various depths (Pierce et al., 2011). Regarding sound location capability, both peripheral and central age-related auditory deficits, such as high-frequency hearing loss or a decline in temporal processing, might affect the utilization of the auditory spatial cues involved in sound localization. For instance, high-frequency peripheral hearing loss in older listeners has been associated with degradation in vertical and front-back discrimination in auditory space (Abel et al., 2000; Noble et al., 1994; Rakerd et al., 1998). Therefore, we predicted that auditory and visual localization would generally deteriorate in older adults due to age-related central spatial processing deficits. In conclusion, older adults were not susceptible to “depth”; therefore, we suspected that one of the reasons why older adults were susceptible to SiFI but were not affected by the space location is an age-related reduction in the spatial localization capacity.

More importantly, spatial location affected fission and fusion illusions in older and younger adults differently. For the fission illusion, accuracy and signal detection theory results indicated that younger adults were substantially affected by the spatial location, as a significant difference was observed between the Vfar_Anear and Vnear_Afar conditions. However, for the fusion illusion, neither participant group was affected by the spatial location, which may be due to the instability of the fusion illusion (Andersen et al., 2004; Shams et al., 2000). Thus, the effect of spatial dominance on the SiFI varied with age, and thus the fission and fusion illusion effects differed between the groups. However, for the RT results, the fusion illusion was also affected by spatial location; namely, when the auditory stimuli were presented in far space, the younger adults’ RT was longer. This result indicated that the auditory competitive advantages will be strengthened and that the illusion will increase, leading to a longer RT. Therefore, the RT results also indicated that the fusion illusion was also affected by spatial location, suggesting that although accuracy is the main outcome of the SiFI (Andersen et al., 2004; DeLoss et al., 2013; Keil, 2020), future studies should focus on RT results (DeLoss & Andersen, 2015). Nevertheless, the present study reported that an enhanced auditory dominance effect was observed in far space in younger adults, which is critical for the effective perception of unsafe objects or events in far space for better adaptation to the world and an increased chance of survival (Graziano & Cooke, 2006).

Conclusions

In summary, by positioning auditory and visual stimuli in near and far space separately, the present study showed that the fission illusion of younger adults was modulated by the depth of auditory stimuli. Although a significant fission illusion effect was observed regardless of whether the auditory stimuli appeared in near or far space, the fission illusion was significantly enhanced in far space. The effect of the spatial location was only observed on younger adults, not on older adults, which indicated a decline in attention and spatial localization capacity due to aging.