Perception of the real world depends on the integration of multisensory information (e.g., from the visual, auditory, and/or tactile senses). In this process, the information provided by each sense is utilized optimally (Alais, Newell & Mamassian, 2010; Ernst & Banks, 2002). In this way, the integration of multisensory information can reduce perceptual ambiguity (Sumby & Pollack, 1954) and increase the accuracy of perception. Multisensory integration is conducted in multiple, lower-to-higher stages, with the initial perception being altered by subsequent integration (Talsma, Senkowski, Soto-Faraco & Woldorff, 2010). Notably, this altered perception frequently differs from the physical input.

In this study, we examined audiovisual integration. Sounds (defined here simply as the information observed by the auditory sense) carry various types of information about the perceived real world. Auditory stimuli frequently alter visual perception through audiovisual interactions. For example, in the case of “flutter-driving” (Shipley, 1964), the auditory flutter rate dominates the visual flicker rate, since the temporal resolution of the auditory sense is superior to that of the visual sense. Sounds also alter the perception of motion: The presentation of auditory stimuli alters the perceived trajectory of motion in a process known as “stream or bounce” (Sekuler, Sekuler & Lau, 1997). That is to say, the changing pitch (ascending or descending) of an auditory stimulus alters the perceived direction of its motion either upward or downward (Maeda, Kanai & Shimojo, 2004). Moreover, the alternation of sound locations leads to the perception of static object movement (Hidaka et al., 2009). Sounds affect the perception of velocity, as well (Kafaligonul & Stoner, 2010; Manabe & Riquimaroux, 2000; Takeshima & Gyoba, 2011).

However, the effects of auditory stimuli on the visual perception of size have not yet been thoroughly examined, although many audiovisual interactions with visual properties other than size have been vigorously examined. Size is an important property in the perception of one’s surroundings. We assume that size perception can be affected by the interactive processing of multiple sensations. Size perception is also connected with perception in other sensory domains (e.g., velocity or weight). For example, the velocity of a large object can be misperceived as slow, and that of a small one can be misperceived as fast (Brown’s law: Brown, 1931). On the other hand, when we lift two objects of the same weight, we misperceive the smaller object as being heavier (size–weight illusion). Thus, the effects of audiovisual interactions on size perception have important theoretical and practical implications.

To a certain extent, research on the role of multisensory integration in size perception is already well underway. Gepshtein and Banks (2003) have studied the interaction between vision and haptics as this relates to size perception. They reported that, in the process of multisensory information integration, haptic size perception information dominates the visual component in cases in which size discrimination by the visual modality alone is difficult. Auditory stimuli do not seem capable of affecting perceived visual size, because the spatial resolution of the visual sense is higher than that of the auditory sense (Welch & Warren, 1980). However, when a target to be discriminated according to size is more distant, the tactile sense cannot contribute to the target size information. In this case, auditory size information may be utilized to complement the visual size information. Carello, Anderson and Kunkler-Peck (1998) provided evidence supporting this hypothesis, reporting that differences in rod length can be discriminated by the sounds of the objects dropping. When the target is far away, the acuity of the visual sense is immediately reduced, as is the available visual information (particularly at the edge of the target). Sounds can improve the intensity of visual stimuli (e.g., Vroomen & de Gelder, 2000), such that auditory stimuli can affect the processing of ambiguous visual stimuli. One would expect a substantial interaction between the auditory and visual senses in the context of size perception. For example, Lipscomb and Kim (2004) showed that object sizes tended to correspond with sound intensities. In particular, larger objects were matched more readily when sound intensity was higher. A 3-D interpretation (e.g., Purves, Wojtach & Lotto, 2011) can also be applied to the relationship between size discrimination and sound intensity. The loudness of sound generally decreases with increasing distance from the sound source (distance decay); thus, sounds produced by near objects are likely to be perceived as being louder than those produced by far objects. Accordingly, the retinal sizes of near objects are also larger than those of far objects. All of these facts indicate the possibility that sound intensity may modulate the discrimination of object sizes. Consequently, we can estimate object size from sound intensity easily or unconsciously. Object size discrimination is generally determined by visual information because of its high spatial resolution; however, when unisensory information resources run short, size discrimination often becomes inaccurate. In such cases, other sensory information is utilized to compensate for this deficit in precision (e.g., Ernst & Banks, 2002; Wada, Kitagawa & Noguchi, 2003). Auditory information also operates as a modifier or gain factor for visual size perception. Therefore, it is possible that size perception could be affected by manipulations of auditory stimulus intensity (Takeshima & Gyoba, 2011).

Experiment 1

In Experiment 1, participants visually discriminated between three stimulus sizes under three conditions: visual only (without sounds), visual with low-intensity sounds, and visual with high-intensity sounds. We investigated whether auditory stimuli would modulate visually perceived size.

Method

Participants

A group of seven observers (one woman, six men) participated in Experiment 1. Both their vision and audition were normal or corrected to normal. None of the participants had been informed of the purpose of the experiment.

Apparatus

The stimuli were generated and controlled by means of a custom-made program created in MATLAB (The Mathworks, Inc.), the Cogent Graphics and Cogent 2000 toolboxes (www.vislab.ucl.ac.uk/cogent.php), and a PC (Model XPS720, Dell Computer, Austin, TX; Windows Vista operating system, Microsoft, Redmond, WA). The visual stimuli were displayed on a CRT monitor (Diamondtron M2 RDF223G, Mitsubishi, Tokyo, Japan; resolution 1,024 × 768 pixels, refresh rate 60 Hz). The auditory stimuli were conveyed through an audio interface (Edirol FA-66, Roland) and headphones (HDA200, Sennheiser). The simultaneity of the visual and auditory stimuli was confirmed using a digital oscilloscope (TS-80600, Iwatsu, Tokyo, Japan). The experiment was carried out in a dark room with approximately 39.1 dB(A) of background noise. The participants viewed the monitor binocularly from a distance of 57.3 cm with their heads stabilized on a chinrest.

Stimuli

A red (21.01 cd/m2) fixation dot and a white (105 cd/m2) disk were presented as visual stimuli on a gray (24.38 cd/m2) background. The fixation stimulus subtended approximately 1.14 deg in diameter. The visual stimuli were of two types and were presented in three sizes. The standard stimulus was 4.0 deg in diameter, and the comparison stimuli were 3.8, 4.0, and 4.2 deg in diameter. The distance from the center of the fixation to the center of the visual stimuli was 7.0 deg; the presentation duration was 50 ms. The auditory stimulus was white noise of 50-ms duration (including a ramp time of 5 ms at the start and the end of the sound wave envelope); the sound pressure levels (SPLs) were 55 and 85 dB. An auditory stimulus was not presented on one third of trials, and on the other two thirds of trials, a 55- or 85-dB SPL stimulus was presented simultaneously with the visual stimulus. The experiment was based on a 3 (object size: 3.8, 4.0, or 4.2 deg) × 3 (auditory condition: no sound, 55 dB, or 85 dB) design.

Procedure

Each trial was composed of a standard stimulus and a comparison stimulus. Figure 1 shows a schematic of a trial; a 3 × 3 factorial design was used, with Object Size and Auditory Condition as within-subjects factors. The participants completed 360 trials (40 repetitions per condition). On half (180) of the trials, the comparison and standard stimuli were presented first and second, respectively; on the remaining half of the trials, this order was reversed. The temporal ordering of the comparison stimulus presentation (first or second stimulus) was randomized. The participants were instructed to fixate on the fixation point. Trials began when the participants pressed the “0” key; each trial began with a 1,000-ms fixation point, followed by the first stimulus, presented for 50 ms. After a 500-ms interstimulus interval (ISI), the second stimulus was presented for 50 ms. The participants were instructed to “report the stimulus perceived as being larger in size” by pressing either of two keys: “1” for the first stimulus, or “3” for the second stimulus.

Fig. 1
figure 1

Schematic representation of the procedure of the present experiments. The upper and lower panels indicate trials on which the comparison stimulus was presented as the first and the second stimulus, respectively

Results and discussion

We calculated the rate at which the comparison stimulus was chosen as being “larger” in each condition. The results are shown in Fig. 2. If participants perfectly discriminated the size of the stimuli, there should be 0 % “larger” responses for the 3.8-deg disk trials, and 100 % “larger” responses for the 4.2-deg disk trials. Moreover, the response rate should approximate 50 % (chance level) on the 4.0-deg disk trials, because the size of the comparison stimulus was the same as that of the standard stimulus under this condition. The participants showed rates of 22.3 %, 46.4 %, and 62.2 % “larger” responses for the 3.8-, 4.0-, and 4.2-deg disk trials, respectively, in the no-sound (i.e., control) condition. These results indicated that the participants’ size discrimination performance was highly accurate, if imperfect. Figure 2 shows the effects of sounds on visual size perceptions. The proportions of “larger” responses were higher for the 85-dB than for the no-sound condition. However, the proportions of “larger” responses were lower for the 55-dB than for the no-sound trials. This indicates that high-intensity sounds increase perceived visual object size, while low-intensity sounds decrease perceived visual object size. The statistical test partly supported these effects. A two-way analysis of variance (ANOVA) with Object Size and Auditory Condition as within-subjects factors was conducted after arcsine transformation of the data. This transformation was conducted to make the variances invariable, regardless of the values of the “larger” rate (P). The following formula shows this transformation (Bland, 1995):

$$ X_i^{\prime }=\mathrm{si}{{\mathrm{n}}^{{\text{--} 1}}}\sqrt{P}. $$
Fig. 2
figure 2

The horizontal axis indicates object size, and the vertical axis indicates the mean rate at which the comparison stimulus was chosen as being “larger.” The error bars represent the standard errors of the means (n = 7).

The main effect of object size was significant [F(2, 12) = 249.08, p < .001]. Multiple comparisons (Ryan’s method) for the main effect of object size showed significant differences among the three object sizes, indicating the participants’ ability to discern size differences. The main effect of the auditory condition was also significant [F(2, 12) = 9.14, p < .005], indicating that participant performance changed as a function of auditory condition. Multiple comparisons for the main effect of auditory condition established that the differences between the no-sound and 85-dB conditions (p < .05) and between the 55- and 85-dB conditions (p < .005) were significant, indicating that adding an 85-dB SPL white noise burst increased the perceived visual sizes of objects. In contrast, the difference between the no-sound and 55-dB conditions was not significant (p > .05); that is, adding a 55-dB SPL white noise burst did not strongly affect size perception. The interaction between object size and auditory condition was not significant [F(4, 24) = 0.56, p = .69].

These results show that the participants reliably discriminated object size and that auditory stimuli altered their visual perceptions of object size; specifically, higher-intensity auditory stimuli increased perceived object size. This result corresponds with those of the matching task employed by Lipscomb and Kim (2004). Additionally, low-intensity auditory stimuli tended to decrease perceived object size. However, this effect lacked statistical significance. Thus, only high-intensity sound modified the visual perception of size.

Experiment 2

The results of Experiment 1 demonstrated that sounds affect visually perceived object size. However, these results might have been induced by participant response bias. If the observed effects were caused by response bias, the visually perceived size should be altered, regardless of the synchrony or asynchrony of the visual and auditory stimuli. In Experiment 2, we manipulated the stimulus onset asynchrony (SOA) between the auditory and visual stimuli. We specifically examined whether this audiovisual interaction has a proper temporal window. If this effect were to have a proper temporal window, it could not be attributed to participant response bias.

Method

Participants

A group of 11 observers (eight women, three men), nine of whom had not taken part in Experiment 1, participated in Experiment 2. Their vision and audition were normal or corrected to normal, and none of the participants had been informed of the purpose of the experiment.

Stimuli

This experiment included three sizes of visual stimuli, as in Experiment 1. The standard stimulus was 4.0 deg in diameter, as was the comparison stimulus. White disks 3.8 and 4.2 deg in diameter were used in the catch trials, in order to prevent the participants from noticing that the size of the comparison stimulus was the same as that of the standard stimulus. For the purposes of Experiment 2, we manipulated the SPLs of the auditory stimuli and the SOA between the visual and auditory stimuli. The experiment had a 2 (SPL: 55 dB or 85 dB) × 7 (SOA: –300, –200, –100, 0, +100, +200, or +300 ms; the minus sign indicates that auditory stimuli were presented before the visual stimuli, and vice versa). The other stimulus conditions were the same as in Experiment 1.

Procedure

The trial flow was the same as in Experiment 1 (see Fig. 1). A 2 × 7 factorial design was used, with SPL and SOA as within-subjects factors. Each participant completed 392 trials (280 experimental trials [20 repetitions per condition] + 112 catch trials [four repetitions per condition for each size]).

Results and discussion

The results of the catch trials were removed from the analysis. We calculated the rate at which the comparison stimulus was chosen as being “larger” in each condition. The results are shown in Fig. 3. If participants perfectly discriminated the sizes of the stimuli, they should provide approximately 50 % (chance level) “larger” responses, as in Experiment 1. Figure 3 indicates that the participants produced approximately 50 % “larger” responses for SOA values of −200 ms or less and +200 ms or more. However, sounds did affect size perception in the SOA range of −100 to +100 ms, with “larger” response rates rising to more than 50 % on the 85-dB trials and falling below 50 % on the 55-dB trials. These effects were supported by a two-way ANOVA, with SPL and SOA as within-subjects factors, which was conducted after arcsine transformation of the data. The main effect of SPL was significant [F(1, 10) = 9.90, p < .05]. The rate of “larger” responses for the 85-dB trials was significantly higher than that for the 55-dB trials. The main effect of SOA was not significant [F(6, 60) = 1.19, p = .33]. However, the interaction between SPL and SOA was significant [F(6, 60) = 2.93, p < .05]. The simple main effect of SOA in the range from −100 to +100 ms was significant (each SOA: p < .005); the rate of “larger” responses was higher for the 85-dB than for the 55-dB trials.

Fig. 3
figure 3

The horizontal axis indicates stimulus onset asynchrony, and the vertical axis indicates the mean rate at which the comparison stimulus was chosen as being “larger.” The error bars represent the standard errors of the means (n = 11)

These results show that the effect of a higher-intensity sound on visual object size perception is limited to a narrow temporal window (−100 to +100 ms). This temporal window is similar to that found with other audiovisual interactions (e.g., –115 to +115 ms in Shams, Kamitani & Shimojo, 2002). Indeed, any audiovisual interaction occurs in a narrow temporal window, given that multisensory information can bind only within that range (i.e., only within this range are multisensory events regarded as being single, bound events). This effect can hardly be attributed to participant response bias.

Experiment 3

The audiovisual interaction effects on size perception had an asymmetric influence between low-intensity and high-intensity sounds in Experiment 1. We assumed that the cue reliability (saliency) of each sensory stimulus might be related to this interaction. The cue reliability of visual stimuli decreases with increasing distance between a fixation point and visual stimuli (i.e., eccentricity). Indeed, several audiovisual interaction phenomena have been reported under conditions in which visual stimuli were unreliable because of manipulations in eccentricity (e.g., Hidaka et al., 2009; Shams et al., 2002). Following these studies, in the present experiment, relative reliability was manipulated through changes in eccentricity: We investigated the effects of retinal eccentricity (i.e., the difference between the central and peripheral visual fields) on the modulation of visual size by sound.

Method

Participants

A group of eight observers (five women, three men), seven of whom had not been part of Experiment 1 or 2, participated in the present experiment. Their vision and audition were normal or corrected to normal, and none of the participants had been informed of the purpose of the experiment.

Stimuli

The sizes of both the standard and comparison stimuli were 4.0 deg in diameter. As in Experiment 2, we additionally prepared a white disk of 4.2-deg diameter, in order to prevent the participants from noticing that the comparison stimulus was the same as the standard stimulus. We also manipulated the retinal eccentricity of the visual stimuli. The experiment had a 3 (auditory condition: no sound, 55 dB, or 85 dB, as in Exp. 1) × 5 (eccentricity: 2.5, 3.75, 5.0, 7.5, or 10.0 deg) design. The other stimulus conditions were the same as in Experiment 1.

Procedure

The trial flow was the same as in Experiment 1 (see Fig. 1). A 3 × 5 factorial design was used, with Auditory Condition and Eccentricity as within-subjects factors. Each participant completed 420 trials (300 experimental trials [20 repetitions per condition] + 120 catch trials [eight repetitions per condition]).

Results and discussion

The results of the catch trials were removed from the analysis. We calculated the rate at which the comparison stimulus was chosen as being “larger” in each condition. The results are shown in Fig. 4. If participants perfectly discriminated the size of the stimuli, they should provide approximately 50 % (chance level) “larger” responses, as in Experiments 1 and 2. Participants had previously reported about 50 % (chance level) “larger” responses in the no-sound conditions. Therefore, they should perform correct size discrimination in Experiment 3. In addition, the same effects of the sounds were observed as Experiments 1 and 2, with the proportion of “larger” responses being higher for the 85-dB trials than for the no-sound trials. Conversely, the proportion of “larger” responses was lower for the 55-dB trials than for the no-sound trials. These effects appeared to be particularly strong for eccentricity values of 5.0 deg. The statistical test partly supported these results. A two-way ANOVA, with Auditory Condition and Eccentricity as within-subjects factors, was conducted after arcsine transformation of the data. The main effect of auditory condition was significant [F(2, 14) = 10.57, p < .005], indicating a change in participant performance as a function of the auditory stimulus. Multiple comparisons of the main effect of auditory condition indicated that the differences between the no-sound and 85-dB trials and between the 55- and 85-dB trials were significant. However, the difference between the no-sound and 55-dB trials was not significant. The main effect of eccentricity was not significant [F(4, 28) = 1.61, p = .20]. However, the interaction between auditory condition and eccentricity was significant [F(8, 56) = 2.20, p < .05]. The simple main effects for the eccentricity conditions of 5.0 deg and over were significant (5.0 deg, p < .05; 7.5 and 10.0 deg, ps < .001). Multiple comparisons indicated that the proportion of “larger” responses was higher for the 85-dB condition than the no-sound and 55-dB conditions.

Fig. 4
figure 4

The horizontal axis indicates stimulus eccentricity, and the vertical axis indicates the mean rate at which the comparison stimulus was chosen as being “larger.” The error bars represent the standard errors of the means (n = 8)

Differences in modulation by the auditory stimulus were observed between the central and peripheral visual fields. These results indicate that presenting high-intensity auditory stimuli simultaneously with visual stimuli increased the perceived size of objects in the peripheral visual field (retinal eccentricity ≥ 5.0 deg). In contrast, the sizes of the visual stimuli were not modulated by simultaneously presented high-intensity auditory stimuli in the central visual field (retinal eccentricity < 5.0 deg). Significantly, it has been shown that visual acuity declines in the peripheral visual field (Anstis, 1974). Accordingly, when the acuity of visual size information is low, the size information carried by sound is used.

General discussion

In the present study, we examined the effects of auditory stimuli on the visual perception of size. In Experiment 1, we investigated the effects of the intensities of such stimuli. In Experiment 2, we examined the temporal window of this audiovisual interaction by manipulating the SOAs between visual and auditory stimuli. Finally, in Experiment 3, we manipulated the retinal eccentricity of visual stimuli to determine the effect of this interaction across different visual fields. The results indicated that visual object size was perceived as being larger when visual stimuli were accompanied by a higher-intensity white noise burst. In contrast, the visually perceived size was not significantly altered by lower-intensity sound. The results also showed that the temporal window of this effect was restricted to the range of −100 to +100 ms, a temporal window very similar to those reported for other audiovisual interactions (e.g., –115 to +115 ms in Shams et al., 2002). We consider an appropriate temporal window of audiovisual integration to be the range within which multisensory events are regarded as a single event. According to Guski and Troje (2003), this range can be estimated as −130 to +250 ms. Thus, the temporal window of the present effect was narrower than that described above for an audiovisual interaction. Therefore, the results that we observed were unlikely to be caused only by participant response bias. Moreover, it was clear that this audiovisual interaction occurred in the peripheral visual field. Overall, the data suggest that the information carried by sound alters the visual perception of size. Object size is related to velocity and weight; therefore, velocity or weight may be indirectly modulated by alterations in size perception due to audiovisual interaction.

Audiovisual interactions have not been investigated extensively, while the effects of visual–tactile interactions on size perception have been examined. Tactile sensation can perceive the sizes of objects; however, auditory sensation incorporates more distant sources of sensory information than tactile sensation can pick up. Thus, audiovisual size perception is related to 3-D interpretation. Our results indicated that visual objects were perceived as being larger when they were accompanied by high-intensity sound. This is analogous to cases in which target objects are discriminated as being larger than their actual size because they exist in near positions. In the matching task, large-sized objects were associated with high-intensity sound (Lipscomb & Kim, 2004). Therefore, humans perceptually revise unclear object sizes according to the correspondence between retinal size and loudness.

Multisensory integration is conducted by optimally utilizing each sensory information cue (Alais et al., 2010; Ernst & Banks, 2002). Accordingly, perception built by multisensory integration depends on the reliability of the information cue for each sense (Angelaki, Gu & DeAngelis, 2009). In the present study, visually perceived size was altered by the presence of a high-intensity auditory stimulus. Low-intensity sounds might also alter participant judgments in the opposite direction, although that effect was not statistically significant. These findings are supported by Lipscomb and Kim (2004), who showed that larger object sizes tend to correspond with higher intensity sounds, while small object sizes tend to correspond with low-intensity sounds. Therefore, it is plausible that lower-intensity sounds may decrease perceived object size, although we found that this effect was much weaker than that of high-intensity sounds. The evidence indicates that the intensity of sounds relates to cue reliability when multisensory information is integrated. Moreover, this audiovisual interaction is particularly strong for objects presented in the peripheral visual field. This is consistent with evidence that cue reliability declines when visual stimuli are presented in the periphery (Anstis, 1974). The phenomenon of perceived size alteration is especially strong when auditory information compensates for the low perceptual acuity of the information provided by visual stimuli in the peripheral visual field. In contrast, visually perceived size is not altered in the central visual field, where the reliability of visual information is high.

A feedback stream from the early auditory cortex (A1) to the early visual cortex (V1) has been found (Clavagnier, Falchier & Kennedy, 2004; Falchier, Clavagnier, Barone & Kennedy, 2002; Rockland & Ojima, 2003). Visual and auditory sensations are thus intertwined in their early stages. Therefore, size information (i.e., sound intensity) is extracted from auditory stimuli in the early stages of auditory processing. The size information extracted by the auditory system is sent to the early visual cortex and is then processed in the visual stream.

We showed that visual perception of size is affected by sound when the size information provided by the visual sense is unclear. This suggests that auditory intensity information is utilized as being complementary with visual size perception. Multisensory interactions are observed in various sensory properties that are important in the perception of the outer environments (e.g., velocity, motion, or size). Multisensory integration is conducted to complete sensory perception on the basis of information from other sensory modalities, thereby stabilizing perception. Our multisensory integrative capacity is essential to our basic abilities to function in the physical world and to perceive our surroundings.