On quantifying multisensory interaction effects in reaction time and detection rate
- First Online:
- Cite this article as:
- Rach, S., Diederich, A. & Colonius, H. Psychological Research (2011) 75: 77. doi:10.1007/s00426-010-0289-0
- 717 Views
Both mean reaction time (RT) and detection rate (DR) are important measures for assessing the amount of multisensory interaction occurring in crossmodal experiments, but they are often applied separately. Here we demonstrate that measuring multisensory performance using either RT or DR alone misses out on important information. We suggest an integration of RT and DR into a single measure of multisensory performance: the first index (MRE*) is based on an arithmetic combination of RT and DR, the second (MPE) is constructed from parameters derived from fitting a sequential sampling model to RT and DR data simultaneously. Our approach is illustrated by data from two audio–visual experiments. In the first, a redundant targets detection experiment using stimuli of different intensity, both measures yield similar pattern of results supporting the “principle of inverse effectiveness”. The second experiment, introducing stimulus onset asynchrony and differing instructions (focused attention vs. redundant targets task) further supports the usefulness of both indices. Statistical properties of both measures are investigated via bootstrapping procedures.
KeywordsAudio–visual interactionRedundant target paradigmInverse effectivenessSequential sampling modelInverse efficiency scoresMultisensory performance enhancement
Reaction times (RT) and response frequencies are data commonly obtained in the behavioral sciences; mean RT and detection rate (DR), for instance, are accepted measures of performance in areas like sensation and perception. Often, these measures are studied in isolation, but sometimes it is more appropriate to consider them jointly, in particular when stimuli are very weak. Near to the detection threshold, both RT and DR are known to change with stimulus intensity (see, e.g., Luce, 1986; Woodworth & Schlosberg, 1956 for a summary), and it is important to know whether an improvement in one measure, say RT, indicates a change in overall performance rather than being compensated by a worsening in the other measure, DR. A general finding from studying such speed–accuracy tradeoffs is that speeded responses tend to be less accurate; that is, a decrease in mean RT is often accompanied with a decreased DR (see, e.g., Luce, 1986 for a summary).
However, this phenomenon is not limited to stimuli near to threshold, as recently demonstrated by Arieh and Marks (2008) in a multisensory identification task. Multisensory tasks involve stimuli from two or more modalities, and a common finding, the so-called intersensory facilitation effect (IFE, Hershenson 1962), is that mean RT to crossmodal stimuli (e.g., light and tone) tends to be faster than to unimodal stimuli (e.g., light). A “true” IFE has been observed if the speeded reaction cannot be accounted for by other mechanism, such as statistical facilitation or response bias. Arieh and Marks (2008) evaluated the amount of multisensory interaction in a speeded identification of color with and without the presence of noise using speed–accuracy tradeoff functions (SATFs; Luce, 1986). Their results suggest that the facilitation of RT in audio–visual conditions is due to a change in the decision criterion induced by the auditory stimulus rather than to an increase of overall performance caused by the auditory stimulus. 1 Lowering the criterion means that the participant responds on the basis of less information, thereby speeding the response but reducing its accuracy.
These results demonstrate that studying multisensory interaction based on RT alone may lead to wrong conclusions. In this paper, we introduce two ways of quantifying overall performance by integrating RT and DR recorded in simple detection tasks. The first measure will make use of an arithmetic combination of RT and DR, namely, inverse efficiency scores (Townsend & Ashby, 1983). The second measure will utilize a sequential sampling model (see, e.g., Luce, 1986 for a summary) to quantify overall performance.
In the following, a brief overview of the quantification of multisensory effects is given, and both the descriptive and the model-based overall performance indices are briefly outlined. Then, a multisensory detection experiment is presented, and new overall performance indices are introduced in more detail before applying them to the results of the detection experiment and demonstrating the evaluation of differences in overall performance via non-parametric bootstrapping. After the presentation of the second experiment, which compares the influence of experimental instructions on overall performance in a detection task, we demonstrate how to adopt the new performance indices to experiments involving different instructions and stimulus onset asynchronies.
Quantification of multisensory interaction effects
Measures of multisensory speedup and detectability
Adaptive behavior in real world situations requires an organism to adequately combine cues from different sensory modalities. Especially in ambiguous or noisy situations (e.g., imagine a walker in a dark park), interpretation of vague information from one sensory modality can greatly be enhanced by further information delivered from other senses. The behavioral consequences of multisensory interaction have been the subject of a large body of research on both humans and animals. In the past hundred years, multisensory research has concentrated mainly on two behavioral measures, RT and, although to a lesser degree, detectability indices.
RT to a visual stimulus tends to be faster when an auditory stimulus is presented in close temporal and spatial proximity (cf., IFE, Hershenson, 1962, or redundant targets effect, e.g., Miller, 1982). This effect proved robust in various behavioral replications when participants were instructed to respond to any stimulus they perceive (i.e., a redundant target paradigm, RTP, e.g., Diederich & Colonius, 1987; Gielen, Schmidt, & Van den Heuvel, 1983; Hershenson, 1962; Miller, 1982), as well as, when participant were instructed to respond only to stimuli of a predefined modality (target stimuli) and to ignore any other stimuli (i.e., a focused attention paradigm, FAP, e.g., Bernstein, Clark, & Edelstein, 1969b; Morrell, 1968; Rach & Diederich, 2006). The magnitude of IFE is modulated by the spatial and temporal alignment of the stimuli and it decreases with increasing temporal separation (“temporal rule”, Bernstein, Clark, & Edelstein, 1969a; Bernstein, Rose, & Ashe,1970; Diederich & Colonius, 1987, 2004; Giray & Ulrich, 1993; Hershenson, 1962; Miller, 1986; Morrell, 1968), as well as with increasing spatial separation (“spatial rule”, Amlôt, Walker, Driver, & Spence, 2003; Arndt & Colonius, 2003; Bernstein & Edelstein, 1971; Colonius & Diederich, 2004; Frens, Van Opstal, & Van der Willigen, 1995; Harrington & Peck, 1998; Walker, Deubel, Schneider, & Findlay, 1997). Moreover, the amount of IFE is larger when stimuli are less intense (“principle of inverse effectiveness”, POIE, Corneil, Van Wanrooij, Munoz, & Van Opstal 2002; Diederich & Colonius, 2004; Rach & Diederich, 2006; see Holmes, 2009 for a critical view).
In addition to RT, multisensory interaction also shows up in detectability of stimuli. A task-irrelevant auditory stimulus can modulate visual perception (Bolognini, Frassinetti, Serino, & Làdavas, 2005; Frassinetti, Bolognini, & Làdavas, 2002), a task-irrelevant visual stimulus can enhance auditory perception (Lovelace, Stein, & Wallace, 2003), and task-irrelevant tactile stimuli can improve auditory detection (Gillmeister & Eimer, 2007). The amount of IFE, in terms of change in detectability, can also be modulated by the spatio-temporal alignment of stimuli (Bolognini et al., 2005; Frassinetti et al., 2002).
From a methodological point of view, all these findings rely on the ability to quantify an organism’s performance in different experimental conditions. If we want to assess and compare the amount of multisensory interaction, it is necessary to compute measures that relate performance in unimodal conditions to performance in crossmodal ones. RT and DR measures can be computed under both conditions indicating the change in performance in crossmodal conditions relative to that in unimodal ones.
In the following, we consider two principally different ways of quantifying IFE from RT and DR. One measure is based on an arithmetical combination of RT and DR, whereas the other utilizes sequential sampling models to combine RT and DR and to provide model parameters from which a multisensory performance enhancement (MPE) measure is derived.
Inverse efficiency scores
In a simple RT experiment, a certain percentage of stimuli will be missed by the participant if the intensity level is weak enough. With increasing intensity, the percentage of missed stimuli will go down and, at the same time, mean RT will also decrease (e.g., Luce, 1986 for a review), resulting in an improved overall performance. However, a decrease in the percentage of misses could also be due to the participant being more careful, at the expense of taking more time to evaluate the stimuli (i.e., an increased mean RT) resulting in a speed–accuracy tradeoff (see, e.g., Luce, 1986 for a summary) without a change in overall performance. Furthermore, an increased carefulness of the participant could also result in an increased mean RT and a decreased detection rate due to an increased evidence threshold, resulting in a decreased overall performance. To differentiate between these possibilities, Townsend & Ashby (1983) introduced a measure combining accuracy and RT (in a choice task) by dividing mean RT by the percentage of correct responses. With this correction, which has later been termed inverse efficiency scores (IES), RTs are inflated in proportion to the error rate. Any difference in IESs between conditions is interpreted as a difference in overall performance; on the other hand, an IES invariant under differing mean RTs and choice frequencies is considered as evidence for a speed–accuracy tradeoff. In multisensory research, IES has been used to correct RT under low accuracy (e.g., Kitagawa & Spence, 2005; Röder, Kusmierek, Spence, & Schicke, 2007; Shore, Barnes, & Spence, 2006; Spence, McGlone, Kettenmann, & Kobal, 2001).
Multisensory performance enhancement
In a simple detection task with weak stimuli, participants are forced to make a decision in each trial. Given some evidence for the presence or absence of a stimulus, they have to decide whether or not to press the response button (cf., Luce, 1986, p. 140).
It is quite obvious that the presented multisensory performance measures, MRE* and MPE, differ in their theoretical foundation. Although being established empirically, the theoretical background of MRE* is rather weak: the calculation of IES is somewhat ad hoc without providing a specific justification for dividing RT by DR. In particular, the absolute magnitude of the “correction” applied to RT by this procedure will depend on the magnitude of both RT and DR. On the other hand, the sequential sampling models on which MPE is based are theoretically elaborate and there are analogies between the postulated mechanisms and the accumulation of neural activation found in organisms (cf., Diederich, 1995, 202). Sequential sampling models are frequently applied to RT and accuracy measures (e.g., Diederich, 1995; Diederich & Busemeyer, 2003, 2006).
To investigate the properties of the new measures, MRE* and MPE, we conducted two experiments with visual, auditory, and audio–visual stimuli of different intensities, followed by some cross-validation studies.
A simple detection experiment with auditory and visual stimuli of several intensities near to the detection threshold was conducted. In concordance with the literature, we expect (1) RT to decrease with increasing stimulus intensity, (2) DR to increase with increasing stimulus intensity, and (3) performance in the bimodal condition to exceed performance in either unimodal condition.
Six students (ages 20–23 years, 3 female) served as participants and were paid for participation. All of them reported normal vision and hearing. Prior to their inclusion to this study, they were informed about the procedure and gave their informed consent. The experiment was conducted in accordance with the ethical standards described in the 1964 Declaration of Helsinki.
The study was conducted in a completely darkened and sound reflection attenuated room. Participants were seated in front of a black desk (180 × 130 × 75 cm), with their head supported by a chin rest attached to the front edge of the desk.
Mounted on the desk, two red light-emitting diodes (LED, ø 5 mm) placed 20° to the left or right of a central fixation point marked by a third LED (fixation LED, red, ø 5 mm, 25 mA, 5.95 mCd) presented the visual stimuli. The three LEDs were arranged on a circle with a diameter of 35 cm centered on the base of the chin rest. Auditory stimuli were presented by two speakers (Canton Plus XS) placed horizontal to the participant’s ear level at 20° to the left or right of the fixation LED. A PC multifunction card was used to control LEDs and speakers.
Responses were recorded using a button operated by the large toe. The toe rested on the button and was to be lifted in order to activate the button. This foot device was used because this experiment was part of a larger study that also employed tactile stimuli applied to the palms.
Experiment 1: intensities utilized given as luminance (mcd) of visual and loudness (dBA) of auditory stimuli
Participants were instructed to respond to every stimulus they detect regardless of its modality by lifting their toe as quickly as possible (redundant target paradigm). To keep participants from responding to the offset of the fixation LED, it was emphasized that the task aims at determining interindividual differences in perception, rather than perfect detection performance. In particular, it was underlined that some of the stimuli are very weak and unlikely to be detected at all.
A recording session of 45 min included two blocks separated by a 5-min break. Recording 32 trials for each of 24 conditions with each of 6 participants resulted in a total of 4.608 trials.
Data recording and preprocessing
A PC connected to the EyeLink was used for data storage and data preprocessing. Trials with RTs faster than 80 ms were classified as anticipation errors (0.9%) and considered as trial without response, as well as ones with RTs longer than 1,400 ms (misses, 0.6%). RTs from the right and the left hemifield showed no systematic difference and were therefore combined across hemifields of stimulus presentation (left or right).
Experiment 1: mean reaction time (with standard error) and detection rate as a function of stimulus intensity and sensory modality
Response time in ms (SE)
For intensity levels I4–I7, MRE was greater than zero, indicating that responses on bimodal presentation outperformed the best unimodal response in terms of speed. For the remaining intensity levels I1 to I3 and I8 the bimodal RT did not undercut the fastest unimodal RT, and therefore MRE was about zero or even slightly negative, i.e., bimodal stimulation did not speed up reactions compared to the fastest unimodal condition. MDEs exhibited a pattern almost opposite. Positive values of MDE were found for intensity levels I1–I5, indicating enhanced detectability on bimodal conditions. MDE was about zero for intensity levels I6–I8, which, however, is no surprise since DR was already almost close to 1 for unimodal conditions, that is, a ceiling effect was observed here.
To summarize, quantifying the magnitude of IFE with respect to either only RT (i.e., in terms of MRE) or only DR (i.e., in terms of MDE) led to contrary results: MRE indicates multisensory facilitation in conditions where MDE indicated none and vice versa. To integrate these opposite findings inferred from RT and DR, we evaluated overall performance in terms of MRE* and MPE.
MRE* calculated from inverse efficiency scores
For all three modalities RT* decreased with increasing stimulus intensity, indicating that the overall performance increased with stimulus intensity. Compared to RT, this decrease was much steeper for RT* (note the differently scaled y-axes in panel a of Figs. 3, 4). For intensity conditions I1– I6, bimodal RT* was lower than either of the unimodal ones; for I7 and I8 bimodal and auditory RT* were about the same. The relative enhancement calculated from mean RT*, MRE* is larger than zero for all conditions, indicating multisensory facilitation across all intensity conditions. Moreover, with increasing stimulus intensity MRE* decreased (see Fig. 4, panel b). For the lowest intensity level I1, MRE* was 26.6%, for the highest intensity level I8 it was 1.7%. Note that this pattern is in accordance with the POIE.
MPE calculated from drift rates
To calculate MPE, we fitted a sequential sampling model with two absorbing boundaries to the data. Such models are often applied to two-alternative choice tasks to account for RTs and choice probabilities (Diederich, 1997, 2008; Diederich & Busemeyer, 2003, 2006; Ratcliff & Smith, 2004). However, it is important to note one difference between the typical two-alternatives sequential sampling model and the version utilized here. From an alternative choice experiment with two alternatives, say A and B, three independent measures can be recorded: the choice probabilities for both alternatives (pA and pB = 1 − pA), as well as the corresponding RTs (RTA and RTB). In a detection task, only two independent measures can be observed: the detection probabilities (pdetected and pnot-detected = 1 − pdetected, and mean RT for trials where participants responded because a stimulus was detected. The RT on trials where participants decided not to respond is not observable.
Different stochastic processes can be used to define a sequential sampling model; the Wiener process, X(t) with drift and bias is considered here for simplicity of demonstration (Diederich & Busemeyer, 2003, 2006; Ratcliff & Smith, 2004). The Wiener process is determined by two parameters: the drift rate δ and the decision criterion θ. The decision criterion θ determines how much activation has to be accumulated in favor of one alternative until an absorbing boundary is reached. We assume θ to be the same for both alternatives. As soon as X(t) ≥ θ or X(t) ≤ − θ a response is initiated with t being called the first passage time (FPT). For constant drift rates, low values of θ result in faster mean FPTs, while large values result slower mean FPTs.
The drift parameter δ represents the effectiveness of a stimulus. For a given boundary θ, a higher value of δ leads to a higher choice probability and a faster mean FPT for one alternative, say A, and at the same time, leads to a lower choice probability and a slower first passage time for the opposite alternative, say B. A decreased magnitude of δ has an opposite effect: choice probability for alternative A decreases and its mean FPT increases, while choice probability for alternative B increases and the respective mean FPT decreases. For detection tasks, A would be interpreted as decision to respond, while B would be interpreted as decision not to respond (cf. Gomez, Ratcliff, & Perea, 2007). Consequently, only RT for alternative A would be examined, ignoring RT for alternative B, because it is unobservable in this case.
Furthermore, to determine the initial state of evidence, X(0) = β × θ, the parameter β was estimated. The initial state of evidence accounts for differences in the amount of activation accumulated necessary between the two alternatives: X(0) = 0 indicates an process not favoring any of the alternatives; X(0) ≥0 indicating evidence in favor of one alternative, and X(0) ≤0 indicates evidence in favor of the opposite. For the application presented in this paper, the main purpose of introducing β is to prevent negative drift rates which occur when probabilities are lower than 0.5.
Finally, a residual time Tr was estimated. It can be interpreted as base time, that is, the time taken by those non-decisional cognitive and motor processes that are not influenced by experimental manipulations under interest, and therefore remains constant across all experimental conditions (i.e., the component of the measured RTs that is independently and identically distributed across trials and across experimental conditions, cf., Townsend & Honey, 2007, pp. 259–260).2
Note that MPE for the lowest intensity level I1, where MPE indicated a response enhancement of 265.0 %, the bar did not fit in the plot, which was kept at the same scale as those for MRE, MDE, and MRE* to allow for better comparison. MPE clearly decreased with increasing intensity levels, thus, evidence for the POIE was observed here.
MRE* and MPE exhibited very similar patterns. Both indices picked up characteristics from MRE and MDE. Like MDE, both measures indicated the largest enhancement for condition I1 and decreasing enhancement from I1 to I3, and like MRE, both indices indicated a comparable amount of enhancement for conditions I4–I6. Nevertheless, this pattern is much more pronounced in the case of MPE. Since for both measures it was not clear whether the observed differences between intensity conditions are statistically meaningful, both were subjected to bootstrapping procedures.
Evaluation of differences in overall performance utilizing bootstrapping procedures
To allow for the examination of variability and the calculation of confidence intervals, the obtained performance indices were subjected to a non-parametric bootstrap procedure. The bootstrap is a Monte Carlo technique that generates simulated data sets by resampling from empirical data observed in the original experiment (Efron & Tibshirani, 1986; Wichmann & Hill, 2001). The non-parametric bootstrap samples simulated data sets by drawing with replacement from the original data and provides a distribution of simulated RTs and a distribution of simulated DRs. These distributions can be used to calculate confidence intervals. We will report 68% confidence intervals, CI68, calculated by the bootstrap percentile method, because they are comparable to common standard error bars. CI68 spans from the 16th to the 84th percentile of the bootstrap distribution, which approximately compares to the original estimate ± 1 standard deviation of a Gaussian (Wichmann & Hill, 2001).
We generated 1,000 non-parametric bootstrap samples from the original data set (see Appendix for details); for each of them we calculated MRE* from RT*. From the resulting distributions of MRE* we calculated CI68. To evaluate the variation of MPE, a sequential sampling model (Wiener processes) was fitted to each of the 1,000 bootstrap samples and 68% confidence intervals, CI68, were calculated from the resulting parameter distributions.
Experiment 1: confidence intervals, CI68, calculated by the bootstrap percentile method as a function of stimulus intensity
Obviously the variation was considerably larger for MPE, than for to MRE*. Nevertheless, both indices displayed similar patterns. MRE* was significantly larger than zero for intensity conditions I1, I4, and I6, indicating an enhanced performance due to bimodal stimulation for the respective stimulus intensities. For MPE intensity conditions I1, I4, I5, and I6 were significantly larger than zero, i.e., the conclusions suggested by both indices differed only in one out of eight conditions.
MRE* for intensity condition I1 was significantly larger than MRE* for condition I3 and for conditions I5 to I8, as the confidence intervals did not overlap for those conditions. No significant differences between intensity conditions were found for MPE.
An experiment was conducted to investigate the influence of intensity levels close to detection threshold on multisensory interaction and to test two proposed indices of overall performance, MRE* and MPE. Visual, auditory, and audio–visual stimuli of eight different intensities were presented to participants, and MRE was calculated from RT, as well as MDE from DR. Furthermore, MRE* was calculated from IES, and MPE from drift rates estimated from the experimental data. MRE* and MPE were subjected to bootstrapping procedures to evaluate the variability of both indices.
From earlier reports (e.g., Diederich & Colonius, 1987, 2004; Gielen et al., 1983; Hershenson, 1962; Miller, 1982), one would expect bimodal RTs to be faster than unimodal RTs. However, as can be seen from the MRE bar plots (see Fig. 3, panel b), this was only the case for intensity levels I4–I7. For intensity levels I1–I3 and I8, the fastest unimodal RT was equal to or even faster than the bimodal one. For the highest intensity level I8, the absence of multisensory enhancement might be due to a performance limit: the “irreducible minimum” of RT puts a lower boundary on RT facilitation (Woodworth & Schlosberg, 1956). For the lower intensity levels, an explanation is not as easy since the POIE (Meredith & Stein, 1986; Perrault, Vaughan, Stein, & Wallace, 2005; Stein & Meredith, 1993) would predict multisensory response enhancement to be most prominent when stimuli are weak. However, if we broaden our analysis from RT and also take DRs into account, it becomes obvious from the MDE bar plots (see Fig. 3, panel d) that, for intensity levels I1–I5, multisensory enhancement actually did manifest in DR. The fact that no enhancement was observed for DR on intensity levels I6–I8 could be explained by ceiling effects: DR on the best unimodal conditions was already nearly perfect, therefore, further improvement through bimodal stimulation was not possible. Multisensory enhancement was evident in DR when it was absent in RT and vice versa; thus, isolated inspection of either MRE or MDE ould have missed out essential information.
Importantly, the multisensory performance indices MRE* and MPE both indicate enhancement for conditions where either MRE or MDE, or both, did as well; i.e., they integrated the information revealed in RT and DR. Despite differences in absolute magnitude, the patterns exhibited by MRE* and MPE were very similar, i.e., the ordinal relations across index values of different intensity levels are almost invariant. Nevertheless, the differences in absolute magnitude were substantial, especially for the lowest intensity condition and, based on the current data set, it cannot be decided whether one of the indices displays a numerical value more appropriate than the other, or whether both are somewhat out of range. This issue may not be settled with behavioral experiments alone but, rather, by simulation studies because the latter allow to manipulate the magnitude of enhancement directly.
To evaluate the variability of MRE* and MPE, we performed bootstrapping studies and, interestingly, the results where similar for both indices. Although the width of the resulting confidence intervals differed substantially (small for MRE* vs. large for MPE), both measures indicated significantly enhanced performance due to bimodal stimulation for almost the same conditions (concordance in seven out of eight conditions). Nevertheless, differences between MRE* and MPE were also observed. MRE* indicated significantly more enhancement for the lowest intensity condition, compared to five out of the seven remaining intensity conditions. This finding presents strong evidence in favor of the POIE (cf., Corneil et al., 2002; Diederich & Colonius, 2004; Rach & Diederich, 2006). Although a similar trend was present, no significant differences between intensity conditions were observed for MPE, probably due to the very large bootstrap confidence intervals. Unfortunately, it is not clear whether the magnitude of these confidence intervals represents true variability of the data, or rather a lack of reliability in the parameter estimation of the bootstrap samples. Nevertheless it is important to note that the model-based approach underlying MPE also allows for a parametric bootstrap (cf., Wichmann & Hill, 2001) which often results in smaller confidence intervals.
To summarize the characteristics of both MRE* and MPE: both measures pick up and integrate characteristics exhibited by RT and DR resulting in very similar patterns, which, however, may differ in magnitude. Both measures lead to converging evidence in line with earlier research, although only MRE* allowed for statistical conclusions derived from bootstrap confidence intervals.
One of the reviewers raised the question whether the offset of the fixation LED simultaneously to the onset of the stimuli might have led the participants to respond only to the offset of the fixation LED and to ignore the visual and auditory stimuli (i.e., performing a focused attention task with the offset of the fixation LED acting as target and the visual and auditory stimuli acting as non-targets). Although we do agree that a detection task without catch trials may lead participants to respond to the offset of the fixation LED, we judge this possibility as very unlikely for this particular set of data, for the following two reasons: (1) the intensity of the fixation LED exceeded the intensity of the strongest visual stimulus by orders of magnitude; since the fixation LED was clearly detectable in every trial, one would not expect that its offset results in a pattern of detection rates as observed in Experiment 1 (see Fig. 3, panel c). In particular, one would not expect DRs much lower than 1 as observed for the lower intensity conditions; (2) participants were informed that some of the stimuli would be very weak and unlikely to be detected at all. Furthermore, it was emphasized that the task aims at determining interindividual differences in perception, rather than perfect detection performance.
Still it cannot be ruled out that participants used the offset of the fixation LED as a warning cue since it reliably announced the presentation of a stimulus. Such a cue would reduce or even diminish the effect of the random durations of the fixation LED, resulting in faster RTs (Luce, 1986) and perhaps higher DRs. Since faster RTs typically result in lower response time gains on bimodal stimulation, the observed results would be more conservative compared to results with catch trials (cf., Gielen et al., 1983).
Nevertheless, this issue poses an interesting problem addressed in a subsequent experiment: Given that different task instructions lead to different patterns of results under identical stimulus conditions, in which conditions would these differences show up?
Two experimental paradigms have been proposed to investigate multisensory interaction. In the FAP task participants are instructed to respond only to stimuli of one particular modality (target stimuli) and to ignore all stimuli from other modalities (non-targets). In the RTP task participants are instructed to respond to any stimulus regardless of its modality. In previous studies comparing RTs recorded from FAP and RTP tasks, it was reported that both unimodal and bimodal responses are faster in RTP (Giray & Ulrich, 1993; Morrell, 1968). Both studies, however, compared data from separate experiments recorded with different participants. Here both experimental tasks are compared in one and the same experiment (i.e., identical stimuli and participants) using a blocked design, allowing to attribute any difference in results to differences between the tasks. Note, in particular, that the influence of stimulus intensity on DR and RT in bimodal conditions is different for RTP and FAP.
For instance, let us assume that both stimuli of a bimodal stimulus complex are perfectly detectable on unimodal presentation, and, hence, on bimodal presentation. Decreasing the intensity of one of the stimuli while leaving the intensity of the other stimulus constant should have different implications in RTP and FAP tasks. For RTP, decreasing the intensity of either of the stimuli should not worsen the detectability of the bimodal stimulus complex, because the unaltered stimulus alone should still be intense enough to be always detected. For FAP, however, it should make a difference whether the intensity of the target or the non-target is decreased. Decreasing the intensity of the non-target should not decrease the detectability of the bimodal stimulus, because the unaltered target stimulus would still be intense enough to be always detected. On the other hand, gradually decreasing the intensity of the target should lead to a decreased detectability of the bimodal stimulus sooner or later, because even if participants still perfectly detect the non-target, they are only allowed to respond if they detect the target too, which should become rarer with decreasing target intensity.
For mean RT, the situation is similar. In an RTP task, decreasing the intensity of either of the stimuli while leaving the other unaltered should increase bimodal RT, but only up to a certain limit where it equals the RT to the unaltered stimulus, and any further decrease of stimulus intensity should not influence bimodal RT anymore. For FAP, again it should make a difference whether the intensity of the target or the non-target is manipulated. Decreasing the intensity of the non-target should increase bimodal RT up to the point where it equals the RT to the target stimulus and further decreasing the intensity of the non-target should have no effect on bimodal RT. On the contrary, gradually reducing the target’s intensity should increase bimodal RT up to the point where no responses occur at all because the target stimulus is too weak to be detected.
In general, decreasing the intensity of one stimulus while leaving the intensity of the other stimulus constant increases the relative effectiveness of the latter. In the FAP, pairing a high-intensity non-target with a low-intensity target maximizes the possible influence of the non-target, because a stronger non-target can provide more response activation in order to elicit a response, whereas the opposite situation (i.e., a high-intensity target presented with a low-intensity non-target) would diminish the influence of the non-target because the high-intensity target already provides enough response activation itself. In the RTP, the influence of stimulus intensity is symmetrical, because no matter which stimulus’ intensity is decreased, the other high-intensity stimulus always provides enough response activation itself, reducing the influence of the low-intensity stimulus. Hence, the difference in IFE between the conditions with a high-intensity and a low-intensity stimulus should be larger in the FAP than in the RTP.
For all intensity conditions of Experiment 1, the RTs to auditory stimuli were about 80 ms faster than the RTs to visual stimuli. Although this difference is well below the 200 ms temporal window of integration that has been reported previously (e.g., Eimer, 2001; Meredith, Nemitz, & Stein, 1987), a smaller difference in unimodal RTs might have led to larger integration effects, because it is assumed that having equal unimodal RTs maximizes the probability of multisensory interaction (physiological simultaneity, cf., Hershenson, 1962). Nickerson (1973) argued that differences in RT between sensory modalities might be caused by differences in “internal arrival times”, that is, the time “that is required for the nervous system to do whatever it does before a response is evoked” (p. 501), assuming that response execution does not differ between sensory modalities. He concluded that stimuli that are matched for internal arrival times have a greater chance to interact than those that are not. Thus, experimenters often try either to match stimuli in terms of RT or, if that is not feasible, delay the faster of the stimuli by a stimulus onset asynchrony (SOA) in bimodal conditions to increase the probability of multisensory interaction and the expected magnitude of IFE (Hershenson, 1962; Hilgard, 1933; Miller, 1986; Morrell, 1968). The latter approach was utilized in this experiment to study how the different measures of overall performance can be adopted to experimental designs with SOA.
Regardless of the experimental task, we expect (1) bimodal RT to be faster than unimodal RTs for both RTP and FAP; (2) IFE to be larger for pairs of low-intensity stimuli compared to pairs of high-intensity stimuli (i.e., the POIE). Furthermore, we expect (3) RTs to be faster in the RTP task than in the FAP task, and (4) the difference in IFE between the condition where one stimulus is of high intensity and the other is of low intensity to be larger in the FAP than in the RTP.
Methods and apparatus
Four students (ages 20–23 years) served as paid voluntary participants. All of them had participated in Experiment 1. Prior to their inclusion to this study, all participants were informed about the procedure and gave their informed consent. The experiment was conducted in accordance with the ethical standards described in the 1964 Declaration of Helsinki.
The study was conducted in exactly the same setup as Experiment 1.
Experiment 2: participant specific luminance (mcd) of visual stimuli, loudness (dBA) of auditory stimuli and stimulus onset asynchronies (ms) utilized in this experiment
Two different experimental tasks were utilized in this experiment. Participants were either instructed to respond to any stimulus they perceive (RTP) or to respond only to the visual stimulus (FAP). The tasks were presented balanced over experimental blocks, ensuring that each session consisted of blocks of either task in randomized order. Prior to each block, participants were informed about the particular task of this block by written instructions, which were additionally read out loud to them once more before the recording started. Within a block visual, auditory, and audio–visual stimuli were presented. Visual and auditory stimuli were either of low or high intensity, resulting in four unimodal conditions (v, V, a, A) and four audio–visual conditions (va, vA, Va, VA). Note that unimodal auditory conditions served as catch trials in the FAP condition. Visual and auditory stimuli were presented either in the right or left hemifield. On audio–visual trials, both the visual and the auditory stimulus were always presented in the same hemifield. The beginning of each trial was indicated by the onset of the fixation LED, which was turned off after a variable fixation time (800–1,500 ms). On unimodal trials, stimulus presentation started simultaneously to the offset of the fixation LED. On audio–visual trials, presentation of visual stimuli always started simultaneously to the offset of the fixation LED, whereas presentation of the auditory stimuli was shifted by an SOA4. Except for participant AS, the presentation of the auditory stimuli started simultaneously to or after the onset of the visual stimuli.
Prior to the main study, each participant completed 2 h of training. In a recording session of 1 h, a participant completed two blocks separated by a break of 10 min. Recording 80 trials for each of 16 conditions with each of 4 participants resulted in a total of 5.120 trials.
Data recording and preprocessing
A PC connected to the EyeLink was used for data storage and data preprocessing. Trials with RTs faster than 80 ms were classified as anticipation errors (0.1%) and considered as trial without response, as well as ones with RTs longer than 1,000 ms (misses, 0.9%). RTs from the right and the left hemifield showed no systematic difference and were therefore combined across hemifields of stimulus presentation (left or right).
For both experimental paradigms, unimodal stimuli of high intensity were detected more often and elicited faster RTs than stimuli of low intensity. The presentation of audio–visual stimuli always led to higher DRs and faster RT, compared to presenting either of the unimodal stimuli alone. For both paradigms, the fastest RTs were recorded in audio–visual conditions when both stimuli were of high intensity (VA). Overall, RTs were slower in the FAP task than in the RTP task, which is in line with previous reports (Giray & Ulrich, 1993; Morrell, 1968). The influence of stimulus intensity on DR in audio–visual conditions was different for FAP and RTP. For FAP tasks, the highest DRs were observed when the visual stimulus was of high intensity regardless of the intensity of the auditory stimulus (Va and VA). For RTP tasks, the highest DRs were recorded when either of the stimuli was of high intensity (Va, vA, VA). Since the presented stimuli and the presentation scheme were the same for both tasks, this difference strongly suggests that participants correctly performed two different experimental tasks, rather than, for instance, simply responded to the offset of the fixation LED.
To evaluate the effects of multisensory stimulation, the relative amount of multisensory enhancement was quantified by calculating MRE from RTs and MDE from DRs. For the results of the FAP tasks, only responses to unimodal visual stimuli were available since participants were instructed not to respond to auditory stimuli. Therefore, the minimum (respectively the maximum) of the unimodal responses was replaced by the visual response in Eqs. 1 and 2 for the evaluation of the FAP condition.
MRE and MDE displayed similar patterns for the FAP task, but differed in magnitude. For both measures, the largest magnitude of IFE was indicated for the presentation of a weak visual stimulus together with a high-intensity auditory stimulus (vA) and the smallest amount of IFE was observed when high-intensity visual stimuli were paired with low-intensity auditory stimuli (Va). MRE was slightly smaller for the presentation of two weak stimuli (va) compared to the presentation of two high-intensity stimuli (VA), whereas MDE was much larger for two weak stimuli (va) compared to two high-intensity stimuli (VA).
For the RTP task, the patterns of MRE and MDE were different. MRE was about the same magnitude for conditions with high-intensity auditory stimuli (VA and vA) and smaller for conditions with low-intensity auditory stimuli (Va and va). MDE was about 0 for condition VA and increased stepwise for the remaining conditions Va, vA, and va.
Evaluation of differences in overall performance utilizing bootstrapping procedures
Experiment 2: median diffusion model parameters and median inverse efficiency scores (IES) from 1,000 non-parametric bootstrap samples
Experiment 2: confidence intervals, CI68, calculated by the bootstrap percentile method as a function of stimulus intensity
For the FAP task, MRE* and MPE yielded similar patterns. The highest amount of IFE was observed for condition vA and both measures were about zero for condition Va. Furthermore, the magnitude of IFE was larger for condition va compared to condition VA, which is in line with the POIE. Significant differences in terms of non-overlapping bootstrap confidence intervals, CI68, were observed as follows. Significant increases of MRE* were observed between conditions Va and VA, as well as between Va and vA. For MPE, only the difference between vA and Va was significant.
The patterns of MRE* and MPE were also similar in the RTP task, as both measures increased stepwise from condition Va, to vA, and va. However, MRE* was significantly smaller for VA than for va, supporting the POIE, whereas MPE yielded an insignificant opposite trend. The difference between MRE* in conditions Va and vA was also significant. No significant differences were observed for MPE.
We conducted an experiment comparing the influence of different experimental instructions on overall performance in an audio–visual detection task. Participants were either instructed to respond to any stimulus they perceive (RTP) or to respond only to visual stimuli and to ignore auditory ones (FAP).
For all experimental tasks, performance on bimodal conditions was higher than performance on the unimodal visual condition, no matter whether performance was quantified in terms of RT, DR, or IES. This replicates earlier reports of RT reductions and DR improvements due to crossmodal stimulation. In the RTP task, the amount of multisensory interaction measured in terms of MRE* and MPE was larger when two stimuli of low intensity were presented together (va), compared to conditions where two stimuli of high intensity were presented (VA). This finding is in accordance with other studies reporting evidence in favor of the POIE.
In the FAP task, MRE* and MPE indicated no multisensory enhancement for the conditions Va and significantly more for condition vA. This observation makes sense in an FAP task, because the stronger auditory non-target in condition vA provides more response activation, whereas the high-intensity visual target in condition Va reduces the influence of the auditory non-target because it already provides enough response activation itself (cf., Bernstein, Chu, Briggs & Schurman, 1973; Bernstein et al., 1970; Colonius & Diederich, 2004). Furthermore, multisensory enhancement observed in condition vA was even larger than in condition va, what may appear as evidence against the POIE at a first glance, because the overall intensity was smaller for the latter stimulus combination. However, in the FAP the POIE only accounts for the intensity of the target stimulus and not for the non-target, because participants are advised to respond only to the target. In the FAP, multisensory interaction occurs if a non-target is able to provide additional response activation, facilitating the response to the target. The relative effectiveness of the non-target, and therefore the amount of additional response activation, can be increased by either rasing its intensity (i.e., direct effectiveness), or by decreasing the intensity of the target (i.e., inverse effectiveness). Note, however, that this principle only works within a limited range: if the intensity of the non-target is increased beyond a certain point, the unimodal RT difference between target and non-target may become too large to result in multisensory interaction. On the other hand, participants will stop responding if the intensity of the target is decreased below the detection threshold.
In concordance with our hypotheses, the difference between the relative response enhancement observed in conditions vA and Va was smaller in the RTP than in the FAP. The assumed symmetry between the stimuli in the RTP was nicely reflected in both IES and drift rates, as both did not differ between vA and Va. For FAP, both IES and drift rates did differ between these conditions: both did indicate a lower overall performance in condition vA. This suggests that the intensity of the visual target stimulus had a larger influence on overall performance than the intensity of the non-target, which, as explained above, had a larger influence on the amount of response enhancement than the intensity of the target.
The diffusion model framework utilized for the calculation of MPE invites speculation about the nature of the observed differences between RTP and FAP. Despite the overall mean RT difference between RTP and FAP, the estimated base time Tr was the same in both paradigms (see Table 5). The slower mean RT in the FAP was reflected by a higher decision criterion θ and a negative β, that is, an initial state of evidence in favor of the alternative “Do not respond”. These differences suggest that in the FAP it was necessary to accumulate more information to elicit a response. This interpretation parallels the actual difference in task, as the RTP demands to decide whether a stimulus is present or not, whereas for the FAP it is necessary to identify the modality of the stimuli.
Summary and general discussion
In many experimental studies, multisensory interaction effects have been observed for both RT and DR. Here we suggested two quantitative indices for these effects that combine RT and DR into a single measure. Both measures relate the unimodal condition with higher performance to the performance under bimodal stimulation. The first index, MRE*, is a descriptive measure based on an arithmetical combination of RT and DR. It is calculated from inverse efficiency scores (IES), that is, RT divided by DR. The second index, MPE, is a model-based measure founded on a sequential sampling model. It is calculated from drift rates of a sequential sampling process (Wiener process) fitted to RT and DR.
Applied to the results from two detection experiments, the two approaches provided converging evidence. Both indices showed patterns that were similar to each other and in line with previous research. Nevertheless, statistical conclusions in terms of bootstrap confidence intervals derived for both indices did not always coincide. The non-parametric bootstrap was utilized for both MRE* and MPE for the sake of better comparability, although the model-based approach of MPE would have allowed for a parametric bootstrap, which often results in smaller confidence intervals.
Apart from the results obtained in the experiments reported here, both measures can be compared in terms of their validity, reliability, and practicability. Validity describes to which extent a measure really assesses the underlying theoretical construct. As mentioned before, a theoretical foundation for calculating IES (by dividing RT by DR) is lacking, it appears a bit ad hoc, and this questions the validity of MRE*. On the other hand, for MPE the theoretical background of sequential sampling models is quite elaborate, these models are successful in describing results of simple reaction tasks as well as choice reaction tasks, and neural underpinning of the model mechanisms have been identified (Diederich, 1995, 1997, 2008; Diederich & Busemeyer, 2003; Luce, 1986; Ratcliff & Smith, 2004).
In order to be reliable, a measure is to remain nearly invariant when calculated repeatedly for the same data. For MRE* this is clearly the case as only elementary arithmetic is involved. This is not necessarily the case for MPE as it involves parameter estimation procedures that may lead to different results depending on particular starting values, objective functions, and minimization routines. Furthermore, achieving a fit does not imply the global minimum to be found, i.e., there might be better solutions. Reliability can be crucial in bootstrap studies where thousands of model fits are necessary, since very frequent poor model fits can add to the width of observed confidence intervals. Therefore, the reliability of MPE critically depends on adequate choices for optimization (relevant expertise can be found in the literature, cf., Wichmann & Hill, 2001).
When it comes to practicability, MRE* is quite easy to handle because IES is computed right away, and so is MRE*. On recent computers, non-parametric bootstrap takes only a couple of minutes, even for large data sets. On the contrary, MPE requires model fits with large numbers of parameters and thereby relies on the quality of the estimation routine. Although we utilized a rather “brute force” approach (we passed estimated parameters multiple times through the estimation routine as start values, see Appendix for details), the computational effort was not substantial. However, for bootstrap studies computation time multiplies and becomes an issue indeed because a model needs to be fitted to each of the generated data sets.
We are aware of approaches that combine a good theoretical foundation with high reliability and practicability. For instance, it is possible to define multisensory indices based on average information rates (AIR, e.g., Fitts, 1966; Hick, 1952) calculated for unimodal and bimodal conditions. AIR is one of the information theoretical approaches that were adopted in psychology in the course of the so-called cognitive revolution more than 50 years ago (see Proctor & Vu, 2006, for a review, but also Baird, 1984, and Luce, 2003, for critical views). It can be used to measure the capacity of sensory channels in absolute identification tasks adopting the concepts of entropy and transmitted information (e.g., Baird, 1984; Fitts, 1966; Hick, 1952). AIR is calculated from RTs and accuracy data and yields channel capacity in terms of transmitted information units per time unit. However, since at least two signals (i.e., at least stimulus vs. no stimulus) are necessary to compute AIR, this approach cannot be applied to data obtained in simple detection experiments without catch trials like those presented in this study. Nevertheless, it appears as a promising approach for multisensory interaction assessment, in particular because it allows for the evaluation of choice tasks with more than two alternatives.
To conclude, with two different approaches we have demonstrated that speed and accuracy can be integrated into a single measure quantifying multisensory enhancement, and how the calculation of confidence intervals can be attempted with bootstrap procedures. Although both approaches led to good results, it appears indicated to further examine the properties of the presented measures in simulation studies which, however, are not within in the scope of this paper. Whenever computational power and time are available, we advocate the use of MPE together with the parametric bootstrap, as it outperforms MRE* in terms of validity. When computational power is an issue or a quick first impression is intended, MRE* can be beneficial. Ideally, both measures are used together in a multi-method approach to cross-validate conclusions.
Note that both a change of response bias and an change of overall performance can provide evidence for multisensory interaction if their occurrence can be attributed to the additional presentation of the auditory stimulus. In this study, however, our interest is directed towards the latter phenomenon.
We are aware of the ongoing debate on whether coactivation effects might be located in the motor component (eg., Diederich & Colonius, 1987; Giray & Ulrich, 1993) or not (eg., Miller, Ulrich & Lamarre, 2001; Mordkoff, Miller, & Roch, 1996). However, the purpose of this paper is not to take a stand in this debate, but rather to provide methods to combine different response measures into one single index of overall performance. Hence, base time as defined here does not include components that are influenced by coactivation effects.
A statistical test for the goodness of fit is given by χ2(21) = 29.6, p = 0.10 (27 parameters were fitted to 48 data points). Excellent fits were indicated for 5 out of 6 participants (observed χ2 values of 22.2, 11.9, 21.0, 10.9, and 18.6) and a very poor fit for the sixth participant (observed χ2 values of 54.6) Note, however, that we did not intend to test the diffusion model with this fit. Instead we utilized the drift rates in a descriptive way to quantify overall performance as indicated by RT and DR.
SOAs where determined individually for each participant in a pilot study. See section “Stimuli” for details.
Note that, for constant θ, β, and Tr, all δi ∈ Δ are independent from each other, thus can be estimated separately.
This research was supported by Deutsche Forschungsgemeinschaft (DFG) Grant No. Di 506/8-1 to A.D. and by SFB/TR31 “Active Hearing”, Teilprojekt B4 to H.C.