Experimental Brain Research

, Volume 193, Issue 3, pp 409–419

The effect of sound intensity on the audiotactile crossmodal dynamic capture effect

Authors

    • Department of Cognitive Sciences and EducationUniversity of Trento
    • Department of Experimental Psychology, Crossmodal Research LaboratoryOxford University
  • Charles Spence
    • Department of Experimental Psychology, Crossmodal Research LaboratoryOxford University
  • Massimiliano Zampini
    • Department of Cognitive Sciences and EducationUniversity of Trento
    • Center for Mind/Brain SciencesUniversity of Trento
Research Article

DOI: 10.1007/s00221-008-1637-9

Cite this article as:
Occelli, V., Spence, C. & Zampini, M. Exp Brain Res (2009) 193: 409. doi:10.1007/s00221-008-1637-9

Abstract

We investigated the effect of varying sound intensity on the audiotactile crossmodal dynamic capture effect. Participants had to discriminate the direction of a target stream (tactile, Experiment 1; auditory, Experiment 2) while trying to ignore the direction of a distractor stream presented in a different modality (auditory, Experiment 1; tactile, Experiment 2). The distractor streams could either be spatiotemporally congruent or incongruent with respect to the target stream. In half of the trials, the participants were presented with auditory stimuli at 75 dB(A) while in the other half of the trials they were presented with auditory stimuli at 82 dB(A). Participants’ performance on both tasks was significantly affected by the intensity of the sounds. Namely, the crossmodal capture of tactile motion by audition was stronger with the more intense (vs. less intense) auditory distractors (Experiment 1), whereas the capture effect exerted by the tactile distractors was stronger for less intense (than for more intense) auditory targets (Experiment 2). The crossmodal dynamic capture was larger in Experiment 1 than in Experiment 2, with a stronger congruency effect when the target streams were presented in the tactile (vs. auditory) modality. Two explanations are put forward to account for these results: an attentional biasing toward the more intense auditory stimuli, and a modulation induced by the relative perceptual weight of, respectively, the auditory and the tactile signals.

Keywords

TouchAuditionMultisensorySound intensityCrossmodal dynamic capture

Introduction

A large body of empirical research has investigated how sensory modalities convey dynamic information (e.g., Gardner and Sklar 1994; Getzmann and Lewald 2007; Soto-Faraco and Kingstone 2004; Sekuler et al. 2002). In particular, many researchers have attempted to investigate how the senses (e.g., audition, vision, and touch) interact to provide a representation of dynamic perceptual events (e.g., Alais and Burr 2004; Bensmaïa et al. 2006; Craig 2006; Huddleston et al. 2008; Lakatos and Shepard 1997; Meyer and Wuerger 2001; Senkowski et al. 2007; see also Soto-Faraco and Kingstone 2004, for a review). This issue is of particular interest given that multisensory integration is central to our perception of motion for many everyday objects and events (cf. Zihl et al. 1983). For instance, information concerning the rapid-approach of a car is not only provided by visual cues (i.e., the rapid expansion of the retinal image) but also by auditory (i.e., the increasing sound emitted by the engine) and perhaps even tactile (i.e., the displacement of the air) cues.

Much research has also addressed the role of crossmodal processing in the perception of apparent motion (Soto-Faraco et al. 2003, for a review). The impression of apparent movement is experienced when two stationary stimuli are displayed in rapid succession from two different spatial positions. Although no physical movement is present, the observer has the impression of a single object moving continuously through space from one position to the other. Largely investigated in vision (e.g., Ramachandran and Anstis 1986; Wertheimer 1912; Yantis and Nakama 1998), this phenomenon has also been shown to occur in the auditory (e.g., Strybel et al. 1989; Griffiths et al. 1994) and tactile (e.g., Gardner and Sklar 1994; Kirman 1974; Olausson and Norrsell 1993) modalities as well (see Kolers 1972, for a review). The perception of apparent motion is modulated by the spatiotemporal relations between the displayed stimuli. According to Korte’s (1915) third law of apparent motion, the interstimulus interval required for optimal apparent motion is a function of the distance between stimulus positions provided that stimulus exposure duration and intensity are kept constant. These parameters, originally established for the case of visual apparent motion, also generalize to tactile and auditory apparent motion, thus suggesting that, at least to a certain extent, the spatiotemporal properties of apparent motion are shared across the senses (Lakatos and Shepard 1997; also see Strybel et al. 1990).

One experimental paradigm that has frequently been used in recent years to investigate how sensory modalities interact in the perception of apparent motion is the “crossmodal dynamic capture” task (Soto-Faraco et al. 2002). In a typical study, two pairs of unimodal stimuli are presented from two different spatial locations at the appropriate temporal interval in order to give rise to the impression of one apparent motion stream in each sensory modality. Participants are instructed to determine the direction of motion in the target modality while simultaneously trying to ignore the apparent motion of the stimuli presented in the distractor modality. People are generally able to accurately judge the direction of target motion when it is concurrently presented with spatially congruent distractor motion, when presented asynchronously with respect to the distractor motion, or else when presented in isolation (i.e., in the absence of any distractor stimuli). However, participants’ performance is often dramatically impaired when they have to try and determine the direction of a target stream presented at the same time as a distractor stream moving in the opposite direction. The crossmodal dynamic capture effect has now been examined between various different pairs of sensory modalities, such as between vision and audition (Sanabria et al. 2004, 2007), between vision and touch (see Lyons et al. 2006; Soto-Faraco et al. 2004a), and between touch and audition (Sanabria et al. 2005; Soto-Faraco et al. 2004a).

The nature of the crossmodal dynamic capture effect—i.e., whether it reflects a genuinely perceptual and/or a post-perceptual/decisional phenomenon—has been investigated recently (Soto-Faraco et al. 2005). Controlling for response-compatibility confounds (i.e., by making participants report whether the two streams moved in the same vs. different directions instead of discriminating between the right vs. left direction of the target stream) and for the use of response strategies (i.e., by presenting the streams at SOAs at which directional information is not consciously available to the observer), the authors found that the thresholds obtained for correct directional discrimination were higher when the two streams were presented together than when they were presented in isolation, thus supporting the account of an automatic perceptual integration of the moving signals.

The perceptual nature of the capture effect (although note that some contribution of post-perceptual factors related to the decision making and/or the response execution cannot be rejected completely in all cases/studies; see Soto-Faraco et al. 2005) led Soto-Faraco and his colleagues to consider it as a genuine capture-like phenomenon, and not just simply the interference of one sensory modality on people’s ability to accurately process the direction in the other modality. In other words, under the appropriate spatiotemporal conditions, task-irrelevant (apparent) motion can significantly affect the direction in which the target (apparent) motion is perceived to occur. Indeed, in the audiovisual version of the dynamic capture task, it has been reported that the participants not only fail to report correctly the direction of the target stream but also report having perceived the auditory stimulus as moving in the same direction as the visual stimulus (Soto-Faraco et al. 2004b, 2005). Some tentative recent evidence has also suggested that a fusion of the signals also occurs in the case of the perception of motion presented in the auditory and tactile modalities (Ooshima et al. 2008), possibly underlying the capture effect observed in the audiotactile version of the task (Sanabria et al. 2005; Soto-Faraco et al. 2004a).

The pattern of results reported in these studies also suggests that the domain of apparent motion perception is characterized by specific asymmetries, as extensively documented in the multisensory integration of static stimuli (see Bertelson and de Gelder 2004; Caclin et al. 2002). In particular, visual motion has been found to profoundly influence the perception of both auditory (Soto-Faraco et al. 2004b; Strybel and Vatakis 2004) and tactile (Lyons et al. 2006) motion, with a capture effect occurring in approximately 50 and 40% of responses, respectively, while visual apparent motion tends not to be captured by stimuli presented in the other modalities.

In the audiotactile domain, contrary to what has been reported for those modality pairings involving vision, the dynamic capture effect occurs in both directions (Sanabria et al. 2005; Soto-Faraco et al. 2004a), with touch capturing audition and audition capturing touch. However, the effect has been shown to be stronger when the target motion is auditory and the distractors are tactile than when the target motion is tactile and the distractors are auditory (occurring in 35 and 15% of responses, respectively). As pointed out by the authors themselves, it is worth investigating whether the asymmetrical capture effect reported between audition and touch reflects constraints inherent in the organization of specific perceptual systems (and thus is consistently replicable across different experimental conditions) or whether instead it reflects the specific set of stimulus parameters used in their study (and thus is liable to be affected by relatively minor changes in the experimental conditions).

As documented previously, hearing and skin sensations share many functional similarities, as both might be induced by the mechanical stimulation (i.e., changes in pressure and/or vibratory rates) of, respectively, the basilar membrane and the skin (see Gescheider 1970; Sherrick 1976; von Békésy 1959). If it is true that auditory and tactile stimuli are in some sense physically linked, it might be argued that the pattern of results reported by Soto-Faraco et al. (2004a) could be strictly related to the particular nature of the stimuli used in that study. If this were to be the case, the asymmetric crossmodal capture effect they reported could be reduced, or even reversed, by manipulating the specific properties of the stimuli presented in one of the two stimulus modalities.

In the present study, we explicitly tested whether, and to what extent, the change of one of the physical properties of the sound (i.e., its intensity) would affect the crossmodal capture effect between audition and touch in a crossmodal dynamic capture task. The auditory stimuli could be presented at one of two sound intensities (75 or 82 dB(A)) and the potential effect of sound intensity on performance was tested by adopting a within-participants experimental design. If the manipulation of this dimension is crucial for the stimuli presented in one sensory modality to influence those presented in another, one might expect that when the target modality is presented auditorily, the tactile distractor motion should induce a stronger influence on the less intense auditory stream than on the more intense stream. Conversely, the crossmodal capture effect of tactile target motion should be stronger when the task-irrelevant motion is constituted by more intense rather than by less intense auditory stimuli. Therefore, the magnitude of the congruency effect between audition and touch and between touch and audition should vary as a function of the intensity of the auditory stimuli.

Experiment 1

Experiment 1 was designed to evaluate people’s ability to discriminate the direction of a tactile apparent motion stream while trying to ignore the direction of an irrelevant auditory apparent motion stream. The sound intensity varied from block to block (82 vs. 75 dB(A) sound pressure level as measured from the participant’s head position).

Participants

Twenty blindfolded participants took part in this study (6 males and 14 females; mean age of 24 years; range from 19 to 44 years). All of the participants reported normal hearing and normal tactile sensitivity. The experiment took approximately 45 min to complete and was performed in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki (most recently amended in 2004, Tokyo; see Blackmer and Haddad 2005). All of the participants gave their informed consent prior to their inclusion in the study.

Methods

Apparatus and materials

Two loudspeaker cones (Dell, A215; Round Rock, TX) positioned on the table-top in front of the participant were used to present the auditory stimuli. The loudspeaker cones were placed approximately 50 cm from the participant’s body, 15 cm to either side of their midline. Two vibrotactile stimulators (bone conduction vibrators, Oticon-A, 100 Ohm; Hamilton, Scotland) were placed in front of each loudspeaker cone, to ensure that the sounds and vibrotactile stimuli came from the same spatial locations (see Fig. 1 for a schematic view of the experimental set-up). The participants responded using footpedals located under the table (one beneath the toes of the right foot and the other beneath the toes of the left foot). The loudspeaker cones, vibrotactile stimulators, and footpedals were all controlled via a computer parallel port using the E-Prime programming language (http://www.pstnet.com), and a custom-built relay box. The experiment was conducted in a dimly-illuminated room. The auditory stimuli consisted of two 50-ms white noise bursts, one presented from each loudspeaker cone, separated by a 100-ms inter-stimulus-interval (ISI) that remained constant across all conditions). The tactile displays consisted of two 50-ms suprathreshold vibrations, one presented from each vibrator, separated by an ISI of 100 ms.
https://static-content.springer.com/image/art%3A10.1007%2Fs00221-008-1637-9/MediaObjects/221_2008_1637_Fig1_HTML.gif
Fig. 1

Schematic diagram showing the experimental set-up used in experiments 1 and 2

Procedure

The participants sat in front of the loudspeaker cones. To attenuate any noise resulting from their operation the two vibrotactile stimulators lay on two foam rectangles (3 cm thick), placed directly in front of the loudspeaker cones. The participants were requested to rest their left index fingertip on the left vibrotactile stimulator and their right index fingertip on the right vibrotactile stimulator. The participants were instructed to rest their feet on the footpedals and to keep their head still while looking straight ahead throughout each block of experimental trials. White noise was presented from two loudspeaker cones positioned 60 cm behind the loudspeaker cones used to present the target stimuli, to mask any auditory cues elicited by the activation of the vibrotactile stimulators. The participants were presented with two blocks of trials, each consisting of 96 trials. In one block, the auditory distractor stimuli were presented at 75 dB(A), while in the other block, the auditory distractors were presented at 82 dB(A).1 The order of presentation of the blocks was counterbalanced across participants. The intensity of white noise used to mask any subtle auditory cues elicited by the activation of the vibrotactile stimulators was also varied from block to block. The intensity of the white noise was set at 75 dB(A) for the blocks in which the auditory stimuli were presented at 82 and 70 dB(A) for those blocks in which the auditory stimuli were presented at 75 dB(A).

In a typical trial, the participants were presented with the target vibrotactile stream to which they had to make an unspeeded footpedal discrimination response, and a distractor auditory stream, which they were instructed to try and ignore. The distractor stream could either be presented at the same time as the target tactile stream (synchronous) or else 500 ms later (asynchronous) and in either the same (congruent) or opposite (incongruent) direction (from right-to-left or left-to-right). The participants were instructed to respond to the direction of the tactile stream2 by pressing the corresponding footpedal (left for leftward-moving targets, and right for rightward-moving targets) and to ignore the auditory distractors as much as possible. The participants were instructed to prioritize response accuracy over response speed. Responses were only collected after 750 ms from the beginning of the trial (i.e., after the complete display of the stimuli), in order to ensure that any lack of an effect of the auditory distractors on the perception of the tactile stream in the asynchronous condition was not caused simply by the participants responding to the tactile target before the auditory distractors had been presented (see Soto-Faraco et al. 2004b, on this point). After a participant’s response had been recorded, there was a random interval (of 1,900, 1,950, 2,000, 2,050, or 2,100 ms) before the start of the next trial. At the end of each block of trials, the participants were instructed to use two 7-point Likert scales in order to judge the strength of their perception of apparent movement elicited by the target stream (1 = no sensation of movement; 7 = strong sensation of movement) and their confidence in their response (1 = no confidence in their responses; 7 = high confidence in their responses).3 The participants completed one block of 12 practice trials at the start of their experimental session in which the tactile streams were presented in the absence of any distractors, to familiarize them with the task at hand.

Results

The accuracy data were normalized using arcsine transformation of the square root of the proportion obtained in each condition for each participant. This procedure converts binomially distributed data, such as proportions, into normally distributed data, thus enabling parametric analysis of the data (Bartlett 1947). The transformed accuracy data were then submitted to a repeated measures analysis of variance (ANOVA) with Synchrony (synchronous vs. asynchronous), Sound Intensity (less vs. more intense), and Congruency (congruent vs. incongruent) as the within-participant factors. Post-hoc Bonferroni adjustments were calculated to further evaluate significance levels. The overall analysis revealed a significant main effect of Synchrony, F(1,19) = 78.77; p < 0.001, with the participants responding more accurately in the asynchronous condition than in the synchronous condition overall (M = 94 vs. 69%, respectively). The main effect of Congruency (measured as the difference in accuracy between the congruent and incongruent conditions) was also significant, F(1,19) = 133.83; p < 0.001, with participants responding more accurately in the congruent trials than in incongruent trials overall (M = 97 vs. 65%, respectively). There was a significant effect of Congruency in the synchronous condition (p < 0.001), but not in the asynchronous condition, (p = 1.00) (mean congruency effect of 58 and 5%, respectively), giving rise to the significant interaction between Congruency and Synchrony, F(1,19) = 104.60; p < 0.001.

The three-way interaction between Synchrony, Sound Intensity, and Congruency was also significant, F(1,19) = 9.35; p = 0.006. In order to determine the cause of this interaction, we performed separate ANOVAs for each level of the Synchrony factor. The analysis of the data from the synchronous trials provided significant main effects of Sound Intensity, F(1,19) = 4.63; p = 0.044, and of Congruency, F(1,19) = 181.64; p < 0.001. The participants responded more accurately when presented with the less intense auditory distractors than with the more intense auditory distractors (M = 72 vs. 66% correct, respectively) and in congruent trials as compared to incongruent trials (M = 98 vs. 40% correct, respectively). The interaction between Intensity and Congruency was not significant, F(1,19) = 2.98; p = 0.10. No significant terms emerged from the analysis of the data from the asynchronous trials. In fact, although the difference between the congruent and incongruent (less intense) trials had a magnitude of approximately 10% (see Fig. 2), the t test comparison between these two conditions failed to reach significance, t(19) = 1.84, p = 0.08. Fig. 2.
https://static-content.springer.com/image/art%3A10.1007%2Fs00221-008-1637-9/MediaObjects/221_2008_1637_Fig2_HTML.gif
Fig. 2

Mean percentage of correct responses in the tactile direction discrimination task as a function of sound intensity and congruency in the synchronous (a) and asynchronous (b) conditions of experiment 1. The error bars represent the standard errors of the means

Perceived apparent motion ratings

The Likert scale ratings (see Table 1) of the perceived apparent motion for the tactile modality (presented with either less or more intense auditory distractors) were submitted to a paired-sample t test comparison. The analysis did not reveal any effect of Sound Intensity, t(19) = −1.07; p = 0.30. This result shows that our participants’ perception of tactile apparent motion was not significantly affected by changing the intensity of the auditory distractors (M = 4.9 vs. 5.1 for less vs. more intense distractor blocks, respectively).
Table 1

Summary of the Likert scale ratings

Experiment

Target modality

Sound intensity

Perceived apparent motion ratings

Response confidence ratings

1

Tactile

Quiet

4.90 (1.52)

4.65 (1.27)

Loud

5.10 (1.48)

4.45 (1.32)

2

Auditory

Quiet

5.25 (1.25)

4.55 (1.39)

Loud

5.10 (1.48)

5.80 (1.20)

The standard deviations of the mean are reported in parentheses

Response confidence ratings

The Likert scale ratings (see Table 1) of the self-confidence in the less or more intense auditory distractor blocks were submitted to a paired-sample t test comparison. Once again, the test revealed no effect of Sound Intensity, t(19) = 0.08; p = 0.45. This result shows that our participants’ confidence in determining the direction of the tactile motion did not change significantly as a function of the intensity of the auditory distractors (M = 4.65 vs. 4.45 for more vs. less intense distractor blocks, respectively).

Experiment 2

Experiment 2 was designed to assess people’s ability to discriminate the direction of auditory streams as a function of the spatial congruency of the tactile distractor streams. The spatiotemporal relations between the target and distractor modalities were the same as for Experiment 1.

Participants

The same participants took part in both experiments.

Apparatus, materials, design, and procedure

These were the same as for Experiment 1, with the following exception: The participants now had to report the direction in which the auditory stimuli appeared to move whilst trying to ignore the tactile distractors (i.e., the roles of stimuli in the two modalities, as target and distractor, were reversed). Before starting the experiment the participants completed one block of 12 practice trials with less intense (75 dB(A)) or more intense (82 dB(A)) auditory stimuli presented in isolation (i.e., without the distracting tactile stimuli). Finally, the participants completed a second block of trials in which they had to respond to the direction of the unimodal auditory streams presented at the intensity not presented in the first unimodal block. The order of presentation of these unimodal blocks was counterbalanced across participants.

Results

First, the transformed accuracy data (see “Results of Experiment 1”) reported with unimodal trials involving the presentation of more versus less intense sounds were submitted to a t test comparison. Sound Intensity affected participants’ performance, t(19) = −2.13; p = 0.046, resulting in more accurate responses for the more intense sounds than for the less intense sounds (100 vs. 98%, respectively). Next, the transformed data from the experimental blocks were submitted to a repeated measures ANOVA with Synchrony (synchronous vs. asynchronous), Sound Intensity (less vs. more intense), and Congruency (congruent vs. incongruent) as the within-participant factors. Post-hoc Bonferroni adjustments were used to determine significance levels. The analysis revealed a significant main effect of Synchrony, F(1,19) = 7.65; p = 0.012, with the participants responding more accurately in the asynchronous condition than in the synchronous condition (M = 98 vs. 96%, respectively). There was also a significant main effect of Congruency, F(1,19) = 34.93; p < 0.001, with participants responding more accurately in the congruent trials than in the incongruent trials overall (M = 99 vs. 95%, respectively). There was also a significant main effect of Sound Intensity, F(1,19) = 10.95; p = 0.004, with participants responding more accurately to the more intense auditory targets than to the less intense target stimuli (M = 99 vs. 96%, respectively). The Congruency effect (i.e., mean difference between incongruent vs. congruent condition) was significant in the synchronous trials (p < 0.001), but not in the asynchronous trials (M = 6 vs. 1%; p = 1.00), giving rise to a significant interaction between Congruency and Synchrony, F(1,19) = 10.31; p = 0.005. The Congruency effect was larger in the less intense auditory target trials (p < 0.001), than in the more intense auditory trials (p < 0.001; M = 5 vs. 1%, respectively), giving rise to a significant interaction between Congruency and Sound Intensity, F(1,19) = 7.38; p = 0.014. The three-way interaction among Synchrony, Sound Intensity, and Congruency was also significant, F(1,19) = 7.93; p = 0.011. Fig. 3.
https://static-content.springer.com/image/art%3A10.1007%2Fs00221-008-1637-9/MediaObjects/221_2008_1637_Fig3_HTML.gif
Fig. 3

Mean percentage of correct responses in the auditory direction discrimination task as a function of sound intensity and congruency in the synchronous (a) and asynchronous (b) conditions of experiment 2. The error bars represent the standard errors of the means

Separate ANOVAs were performed at each level of the Synchrony factor. In the synchronous trials, the main effect of Sound Intensity was significant, F(1,19) = 10.45; p = 0.004, with participants responding more accurately to the more intense auditory targets than to the less intense targets (M = 99 vs. 93%, respectively). The main effect of Congruency was also significant, F(1,19) = 27.37; p < 0.001 (M = 93% in incongruent vs. 99% in congruent trials). When the two apparent motion streams were concurrently presented moving in opposite directions, the probability that the participants would report having perceived the auditory motion as moving in the wrong direction (i.e., in the direction in which the distractor motion happened to move) was significantly higher if the target auditory motion consisted of the less intense (p < 0.001), rather than the more intense sounds (p < 0.001; congruency effect of 11 and 1%, respectively), giving rise to a significant interaction between Sound Intensity and Congruency, F(1,19) = 12.25; p = 0.002.

No significant terms emerged from the analysis of the data from the asynchronous trials.

Perceived apparent motion ratings

The Likert scale ratings (see Table 1) reported for perceived auditory motion with the less intense and more intense sounds (M = 5.3 vs. 5.1, respectively) were submitted to a paired-sample t test comparison. The analysis did not reveal any effect of Sound Intensity, t(19) = 0.83; p = 0.42.

Response confidence ratings

The Likert scale self-confidence ratings (see Table 1) for the less intense and more intense auditory streams were submitted to a paired-sample t test comparison. The test revealed a significant effect of Sound Intensity, t(19) = −4.63; p < 0.001, with participants being more confident when discriminating the direction of the more intense than of the less intense auditory streams (M = 5.8 vs. 4.6, respectively).

Perceived apparent motion ratings (experiment 1 vs. 2)

The ratings of apparent motion given by the participants in Experiments 1 and 2 (see Table 1) were subjected to a repeated measures ANOVA with Modality (touch vs. hearing) and Sound Intensity (less vs. more intense) as the within-participants factors. Neither the main effect of Modality, F(1,19) = 0.39; p = 0.54, nor of Sound Intensity, F(1,19) = 0.03; p = 0.86, or their interaction, F(1,19) = 2.06; p = 0.17, reached the significance.

Response confidence ratings (experiment 1 vs. 2)

Participants’ confidence rating responses in Experiments 1 and 2 (see Table 1) were subjected to a repeated measures ANOVA with Modality (touch vs. hearing) and Sound Intensity (less vs. more intense) as the within-participants factors. The analysis revealed a significant main effect of Modality, F(1,19) = 5.22; p = 0.034, with participants giving lower confidence ratings in the tactile target blocks that in the auditory target blocks (M = 4.6 vs. 5.2, respectively). The analysis also revealed a significant main effect of Sound Intensity, F(1,19) = 9.75; p = 0.006, with participants giving lower confidence ratings in response to trials performed with less intense than with more intense auditory targets (M = 4.6 vs. 5.1, respectively). There was a significant interaction between Modality and Sound Intensity, F(1,19) = 12.69; p < 0.001, with significantly lower confidence ratings being given in the less intense than in the more intense auditory target blocks (M = 4.6 vs. 5.8; p < 0.001), while no significant difference was reported when the target modality was tactile, (M = 4.7 vs. 4.5; n.s.).

General discussion

The novelty of the present study lies in the fact that it represents the first attempt to investigate whether participants’ performance in an audiotactile crossmodal dynamic capture task can be modulated by the intensity of the stimuli used. To this end, we manipulated the intensity of the sounds (i.e., 75 vs. 82 dB(A)), alternately used as the target or distractor stimuli. Sound intensity was manipulated from block to block, with participants performing the tasks with the auditory stimuli being presented at both 75 and 82 dB(A). Note that the intensity of tactile stimulation was kept constant throughout. The results showed an effect of changing in the sound intensity on performance both when the targets were presented tactually (Experiment 1) and when they were presented auditorily (Experiment 2).

Given that stimulus intensity is a dimension that is inherent to the stimuli presented in all sensory modalities, previous studies have investigated how this dimension can be differentially coded and compared across the senses. For instance, Marks has repeatedly tried to quantify the perceptual similarity of stimulus intensity across differing sensory modalities and to what extent these crossmodal equivalences can be considered absolute (e.g., see Marks 1988; Marks et al. 1986). Meanwhile, other researchers have investigated how the concurrent presentation of vibrotactile stimuli affects the perceived intensity of auditory stimuli. For example, Schürmann et al. (2004) have shown facilitatory interactions between simultaneously-presented auditory and tactile stimuli (i.e., enhanced audiotactile multisensory interactions have been reported at low auditory and tactile stimulus intensities). Similarly, Gillmeister and Eimer (2007) have recently shown that the tactile enhancement of auditory loudness is more pronounced when sounds are presented at a lower stimulus intensity, and that this effect declines with increasing auditory intensity. The most important difference between these previous studies and the present one lies in the methodology adopted: in previous studies, the participants were explicitly required to judge the intensity of the perceived stimuli. By contrast, in the present study, the effect of stimulus intensity on performance was tested indirectly by comparing the magnitude of the observed crossmodal dynamic capture effect, by varying from block to block both the intensity of the sound (i.e., 75 vs. 82 dB(A)) and the target modality (i.e., tactile or auditory).

The results of Experiment 1 revealed that the capture effect on tactile target motion was modulated significantly as a function of the relative intensity of the auditory distractor motion, with more intense auditory distractors exerting a stronger crossmodal capture effect than less intense auditory distractors. This result would seem to contradict the well-known law of inverse effectiveness, according to which maximal crossmodal interactions take place when the two stimulus components are themselves minimally effective (e.g., Stein and Meredith 1993; Stein and Stanford 2008). One may speculate as to whether in the present study the intensity of the sound might somehow have modulated participants’ perception of auditory apparent motion and, consequently, the crossmodal capture that the auditory streams were capable of exerting over the tactile streams. However, according to participants’ self-reports, which did not exhibit any signs of a change of the impression of the tactile apparent motion as a function of the change in sound intensity (see “Experiment 1”, Results section, and Table 1), this explanation does not seem to provide an adequate explanation for our results. It seems reasonable therefore to assume that the process underlying our results involves a different mechanism, such as, for example, a shift of the focus of attention resulting from the change of sound intensity. If this were to have been the case, attention would have been shifted toward the higher intensity sounds, thus determining a stronger capability to capture the motion of the tactile stream. This explanation would be consistent with previous electrophysiological studies that have documented a larger-amplitude P300 signal with higher intensity stimuli, which may reflect an increase in attention determined by more intense stimulus (see Lindín et al. 2005).

The role of attention in the crossmodal dynamic capture task has been investigated recently by Oruc et al. (2008); see also: Huddleston et al. 2008). In their study, three different attentional conditions were introduced: the modality pairings were held constant across the block of trials, but the target modality could be either known in advance (Blocked group) or identified by a pre-cue at the start of each trial (Pre-cued group), or identified by a post-cue after each stimulus presentation (Post-cue group). Hence, in contrast to the other groups, the Post-cued group had to attend to both modality streams on each trial. The results showed that the attentional manipulations did not significantly affect the discrimination of the visual motion when paired with either tactile or auditory irrelevant motion, suggesting a robust advantage of the visual modality in conveying dynamic information. In the case of the audiotactile pairing, the dynamic capture effect was found to be not only reciprocal, but also influenced by participants’ attentional focus. Namely, the crossmodal dynamic capture effect increased when the participants were requested to attend to both dynamic streams as compared to when attention was only focused on the target modality. These results suggest that attention selectively affects the modalities that convey motion information of comparable magnitude, such as audition and touch.

In Experiment 2, the ability of participants to correctly report the direction of auditory apparent motion was significantly better overall if the stimuli were presented at 82 dB(A) than at 75 dB(A). Even more interestingly, the accuracy of participants in determining the direction of an auditory apparent motion display presented concurrently with task-irrelevant tactile motion moving in the opposite direction (i.e., the magnitude of the crossmodal dynamic capture effect) varied significantly as a function of the intensity of the auditory stimuli. This result means that the congruency effect exerted by the tactile motion on the perceived direction of the auditory motion was significantly stronger for less intense than for more intense auditory stimuli. This was also mirrored by participants’ self-reports, as they claimed to be less confident in determining the direction of the less (vs. more) intense auditory stimuli (see the “Results” of Experiment 2, and Table 1).

The pattern of results reported in the present study can be interpreted by taking into account the relative reliability of the two modalities involved. In one of their studies, Bresciani and Ernst (2007) presented series of beeps and taps and requested participants to report the number of stimuli that had been presented in the target modality while ignoring the distractors presented in the other modality (2007). According to the maximum likelihood estimation model, the reliability of a sensory channel is related to the relative uncertainty of the information it conveys. The higher the relative variance of a sensory modality the weaker is its relative reliability (Ernst and Bülthoff 2004). To manipulate the reliability of the auditory information, the auditory stimuli were presented at 41 or at 74 dB (signal-to-noise ratio of, respectively, −30 and 3 dB). Bresciani and Ernst found that reducing the intensity of the auditory stimuli decreased the relative reliability of the auditory modality, with participants being more sensitive (i.e., their estimates were less variable) in counting the number of the more intense (vs. less intense) beeps presented with irrelevant taps and, conversely, in counting the number of taps presented with irrelevant less intense (vs. more intense) beeps (see also Wozny et al. 2008). Although there are a several methodological differences between the previous study and the present one (i.e., the kind of task and the relative difference in intensities of the auditory stimuli) these two studies are thus consistent in showing a modulation of the performance according to the change of the level of intensity of the information provided by the auditory modality.

Another result to have emerged from the present study is that the overall accuracy of participants’ performance was lower in Experiment 1 than in Experiment 2. Thus, it seems likely that the discrimination of the direction of the target motion stream was harder when it was presented in the tactile (vs. auditory) modality and the distractors were presented in the auditory (vs. tactile) modality. Note, however, that according to participants’ self-report ratings, the apparent motion presented in the two modalities was comparable in terms of its strength (see “Results”, Experiment 1 vs. 2). One might attribute this result to the fact that the intensity might have not been equally measured in both modalities. Although there is evidence to show that the crossmodal matching of intensity is by no means a simple issue, given that it is susceptible to biases and to a great individual variability and thus difficult to assess (cf. Marks et al. 1986; Spence et al. 2001), it is not possible to exclude the possibility that the lack of could have contributed to this result.

It is also worth noting that the average rating for the impression of apparent motion experienced by the participants when the target stream was presented in the auditory modality and the distractors in touch was, although present, not very high (fluctuating around a rating of 5). One may speculate as to whether the effect explored here somehow requires stimulus conditions that render the perception of the target motion “fragile” and if it would thus disappear with a more robust impression of motion. In order to address this point more specifically, we further analyzed our data in order to calculate the median value of the ratings for the “goodness of motion” reported by all participants for each target modality (i.e., auditory or tactile), which was equal to 5. We then performed separate analysis for those participants whose mean ratings were higher than 5 and for participants whose ratings were lower than 5 (10 participants in each group; each participant judging 128 trials in total). This analysis revealed that the magnitude of the capture effect did not differ between the two groups (i.e., the interaction between the Group and Congruency factors was not significant, in either the auditory target blocks, F(1,18) = 0.25; p = 0.62, or in the tactile target blocks, F(1,18) = 3.05; p = 0.10). Although these data have to be considered with some caution (as they could possibly reflect response bias), they nevertheless suggest that the strength of participants’ impression of apparent motion did not affect how the modalities interacted (at least within the range of stimulus values tested in the present study).

Those studies that have investigated how intramodal visual perceptual grouping modulates both audiovisual (Sanabria et al. 2005) and visuotactile (Lyons et al. 2006) motion information have provided somewhat discordant results. In both studies, the task-irrelevant visual motion consisted of either six or two lights. The results consistently showed that the six-lights condition resulted in a significantly weaker crossmodal capture of the perceived direction of auditory and tactile motion than did the two-lights condition. This result stands in apparent contrast with the evidence that the six-lights condition conveyed a stronger impression of visual apparent motion compared to the two-lights condition. These outcomes have been interpreted in terms of the six-light condition being more likely to be segregated into two moving streams than the two-light condition, thus resulting in a weaker crossmodal dynamic capture effect. Hence, the magnitude of crossmodal integration seems to be a more salient factor in determining the crossmodal dynamic capture effect than the strength of apparent motion per se. As yet, no attempt has been made to investigate this topic in the audiotactile domain.

Although, if it is true that the more the two streams are segregated, the weaker the dynamic capture effect, one would speculate that presenting the task-irrelevant tactile stimuli to more than one fingertip of one hand (or presenting the stimuli to more than one fingertip of each hand) would result in a decrease (or perhaps even the disappearance) of the already weak capture of the auditory targets. Conversely, presenting the auditory stimuli from more than two loudspeakers would cause a reduction of the crossmodal capture of the tactile stimuli by the auditory stimuli.

It is worth noting, however, that in these the two above-mentioned studies investigating the effect of the intramodal perceptual grouping on crossmodal dynamic capture effect (Lyons et al. 2006; Sanabria et al. 2005), the impression of apparent motion was only manipulated in the task-irrelevant modality (i.e., the target streams always consisted of two stimuli). To the best of our knowledge, no previous study has investigated the magnitude of the crossmodal dynamic capture occurring between task-relevant and -irrelevant streams of stimuli, each consisting of more than two stimuli. Such a manipulation would allow one to directly investigate the extent, if any, to which the increasing of the impression of apparent motion in spatiotemporally matched target and task-irrelevant moving streams would affect the magnitude of the dynamic capture effect.

It must also be noted that the crossmodal capture effect of the tactile apparent motion on the auditory apparent motion was, although significant, quite small and independent of the intensity at which the auditory stimuli were presented. This result contrasts with those of previous audiotactile crossmodal dynamic capture studies, where it has been reported that tactile apparent motion exerts a stronger capture effect on auditory motion than vice versa (Sanabria et al. 2005; Soto-Faraco et al. 2004a). It is worth noting here that a meaningful comparison with the results obtained in previous studies is hard to make because the intensity of the auditory stimuli used in the present study (i.e., 75 and 82 dB(A)) are overall higher than that used before (i.e., 60 dB(A) in Sanabria et al. 2005; 65 dB(A) in Soto-Faraco et al. 2004a). Furthermore, the nature of the sounds was also different (i.e., white noise bursts in the present study vs. pure tones in previous studies), and this may constitute an additional reason for the inconsistency. However, another study conducted in this laboratory, aimed to explore the effect of sound complexity on audiotactile crossmodal dynamic capture provided a pattern of results comparable to the one observed in the present study (see Occelli et al. submitted). The results of this study not only show that the complexity of the sounds has a significant influence on participants’ performance (i.e., more accurate judgments of the direction of, respectively, the tactile streams presented concurrently with pure tone distractors and of the auditory streams consisting in white noise bursts), but also that the capture effect exerted by the auditory distractors was higher than that exerted by the tactile distractors, thus bolstering the results here reported.

Taken together, the results of the present study suggest that the change of one of the physical properties of the auditory stimuli, such as their intensity, can affect people’s performance in an audiotactile crossmodal dynamic capture. It will be interesting in future research to investigate whether similar outcomes will emerge if other stimulus features are varied instead, such as the strength of apparent motion (by using, for instance a higher number of tactile stimulators and loudspeakers), or even whether the synaesthetic congruency between the auditory and tactile stimuli might also modulate the size of the crossmodal dynamic capture effect (cf. Parise and Spence 2008).

Footnotes
1

Ten blindfolded control participants (5 males and 5 females; mean age of 25 years; range from 21 to 31 years) were asked to evaluate whether the auditory stimuli (12 trials presented in the absence of any distractors) presented at the two intensities differed from each other. Six of the participants could not perceive any difference, two perceived a difference, in a dimension other than the one manipulated (i.e., the more intense stimuli were perceived as being “quicker” or “more similar” than the less intense stimuli), one perceived a difference but was unable to verbalize in which way they differed. Only one participant reported having perceived that the stimuli differed in intensity. Thus, the two sound intensities used in the present study did not differ in terms of their perceived loudness.

 
2

To test whether the impression of apparent motion could be conveyed by the kind of stimulation utilized in the present study, we asked ten blindfolded control participants (5 males and 5 females; mean age of 25 years; range from 21 to 31 years) to rate the strength of apparent motion following unimodal stimulation (i.e., 12 streams presented in isolation). The mean apparent motion rates were 4.1 (range: 1–6) for the tactile motion, 4.0 (range: 1–6) for the less intense auditory motion, and 4.7 (range: 1–6) for the more intense auditory motion, showing that some impression of motion was present (cf. Sanabria et al. 2005, for similar results). The fact that the impression of apparent motion experienced in the tactile modality was, on average, weaker than that conveyed by the auditory modality (although note that the stimulation was spatiotemporally similar in both modalities) raises a concern about the judgments of tactile motion and, indirectly, on the specific kind of tactile stimulation used in the present study. It is plausible, for instance, that the tactile stimulation typically used in the crossmodal dynamic capture paradigm only conveys a relatively weak impression of apparent motion (see Sanabria et al. 2005, for just such a claim). This may have been due to the fact that only two tactile stimuli were used (the sensation of tactile movement is more likely to be conveyed when a number of stimulators higher than two is used; Kirman 1974) and that they were applied to sites which were not only spatially disparate, but were also located on two different parts (sides) of the body. As in other studies on apparent motion, it has been shown that with sparse tactile stimulation, it is possible that participants could infer the direction of the tactile stream relying on the spatial position and sequence of the stimuli, but actually without experiencing any motion impressions (Lakatos and Shepard 1997). Thus, the possibility that the participants in the present study could have relied on the temporal order of the tactile stimuli instead to the tactile motion cannot be excluded. Next investigations on this topic have thus to involve a different kind of tactile stimulation and/or methods to control for this bias.

 
3

In contrast to other studies on auditory and tactile apparent motion that have investigated participants’ impression of motion only informally (cf. Lakatos and Shepard 1997), and previous studies on the crossmodal dynamic capture effect, which did not investigated this topic, we chose to describe the participants’ self-reports about the impression of motion experienced while performing the task. The reasoning behind this choice was twofold, aimed on the one hand at providing a formal measure of the impression of apparent motion vs. succession of tactile stimulation and, on the other, on exploring whether the impression of movement in the different sensory modalities was qualitatively similar. Note, however, that the fact that participants had to provide their ratings after the end of each experimental block means that they had to average out both the impression of apparent motion and of response confidence experienced across all the various trials composing the block. This procedure, although possibly inducing a partial distortion of the judgments, was chosen in order to avoid making the experimental session last too long.

 

Acknowledgments

M.Z. was supported by a returning grant “Rientro dei cervelli” from the MURST (Italy).

Copyright information

© Springer-Verlag 2008