A tutorial on testing the race model inequality
When participants respond in the same way to stimuli of two categories, responses are often observed to be faster when both stimuli are presented together (redundant signals) relative to the response time obtained when they are presented separately. This effect is known as the redundant signals effect. Several models have been proposed to explain this effect, including race models and coactivation models of information processing. In race models, the two stimulus components are processed in separate channels, and the faster channel determines the processing time. This mechanism leads, on average, to faster responses to redundant signals. In contrast, coactivation models assume integrated processing of the combined stimuli. To distinguish between these two accounts, Miller (Cognitive Psychology, 14, 247–279, 1982) derived the well-known race model inequality, which has become a routine test for behavioral data in experiments with redundant signals. In this tutorial, we review the basic properties of redundant signals experiments and current statistical procedures used to test the race model inequality during the period between 2011 and 2014. We highlight and discuss several issues concerning study design and the test of the race model inequality, such as inappropriate control of Type I error, insufficient statistical power, wrong treatment of omitted responses or anticipations, and the interpretation of violations of the race model inequality. We make detailed recommendations on the design of redundant signals experiments and on the statistical analysis of redundancy gains. We describe a number of coactivation models that may be considered when the race model has been shown to fail.
KeywordsAttention: divided attention and inattention Reaction time methods Multisensory processing
Redundant signals effect
Redundant signals are typically utilized to ensure that important, broadcast information is conveyed to an operator or user. For example, auditory-visual warning signals (i.e., flashing lights and sirens) are used in ambulances and police vehicles to make certain that drivers stop or yield as they head to their destinations. The use of multimodal redundant signals for alerting systems is imperative because they result in faster and more accurate responses than alerting systems without redundant coding. How does the human mind integrate the information from the different senses? By testing specific predictions of certain models one can infer the organization and characteristics of human information processing. A classical study of redundant signaling was performed by Hershenson (1962; the very first study with redundant signals has been reported by Todd, 1912, Section IV.2). Hershenson implemented a speeded response time (RT) task where observers had to respond by a keypress whenever they were presented with visual, auditory, or audio-visual stimuli. The redundant, audio-visual signals elicited the lowest mean RT relative to the single-signal conditions; this phenomenon is the “redundant signals effect.” Such redundant signals effects have been found across different response demands such as go/no-go and choice RT tasks (Grice & Canham, 1990; Grice, Canham, & Boroughs, 1984; Grice, Canham, & Gwynne, 1984; Grice & Reed, 1992).
While Hershenson’s results refer to simple responses to bimodal stimuli (i.e., of two sensory modalities), redundant signals effects have also been reported for stimuli presented within one sensory modality. If participants are instructed to respond whenever they detect a visual signal, they are faster if two visual signals are presented simultaneously compared to a single-signal presentation (e.g., Corballis, 2002; Miller & Adam, 2006). In the auditory modality, the pattern is more complex (e.g., no redundancy gains with pure tones of equal frequency; Schröter, Ulrich, & Miller, 2007).
Several models explain such redundancy gains. In the so-called race model (Miller, 1982) or parallel, first-terminating model (Townsend & Ashby, 1983), the redundancy gain is a consequence of “statistical facilitation” (e.g., Raab, 1962). Detection times for single target stimuli vary from trial to trial based on a statistical distribution. Whenever two target stimuli are presented, the stimulus that is faster processed triggers the response (i.e., this stimulus wins the “race”). As the processing time distributions for the two stimuli usually overlap, slow processing in one channel is compensated by faster processing in the other channel. Consequently, mean RTs for redundant stimuli are faster than mean RTs for single stimuli.
Instead of separate processing of the two signals, coactivation models assert some kind of integrated processing of the two stimuli. For example, Miller (1982, Appendix) proposed that the activations induced by both stimuli add up and the summed activation enables faster response initiation (“coactivation model”; see also Blurton, Greenlee, & Gondan, 2014; Diederich, 1995; Miller, 2004; Schwarz, 1989, 1994; Townsend & Nozawa, 1995). In order to distinguish between separate processing (race model) and integrated processing (coactivation model), Miller (1982) derived the well-known race model inequality (RMI), which has become a routine test for behavioral data in experiments with redundant signals. If a given data set violates the RMI, separate processing cannot explain the redundancy gain (for details, see below). Consequently, most researchers would then assume the alternative explanation for the given data set (i.e., integrated/“coactive” processing of the two stimuli).
This article begins with a derivation of the RMI. We then provide a short review of the methods used to test for a violation of the RMI and highlight some of the frequently encountered issues when testing the RMI. We close with a description of coactivation models and two generalizations. For simplicity, the example of redundant signals from the visual and auditory sensory modalities shall be used throughout this article with A, V, and AV denoting the auditory, visual, and audio-visual conditions, respectively. Of course, the derivations hold for any other combination of stimuli within or across modalities.
Race model inequality
The race model is utilized to describe and predict how two or more stimuli may be processed in a redundant signals task. In a typical redundant signals task, three sets of RTs are involved; one for each single-signal stimulus and one for the redundant-signal stimulus. Miller’s (1982) original race model assumes that the information that gets passed through these channels is processed separately. In other words, activation from one channel does not accumulate with the activation of the other channel. From this assumption (and another assumption, “context invariance”; see below), the RMI can be derived (Miller, 1982, Inequality 2) that describes the upper limit of the redundancy gain.
We mention this here because Eq. 7 is sometimes used in studies with redundant signals (e.g., Stevenson et al., 2014; see also Current practice section below). We do not recommend using it because we think that the assumption of channel independence is poorly motivated.
The cumulative distribution functions in Inequalities 6A and 6B relate to theoretical probabilities that are not seen in an experiment. However, experimental data can be used to obtain empirical estimates Ĝ(t) for these probabilities. Ĝ(t) is given by the proportion of RTs below t in a specific experimental condition. These estimates can then be used to test whether the RMI holds or not in a given experimental task.
Testing the RMI
In RT studies in experimental psychology hypothesis tests generally refer to differences in the mean RT observed under two or more experimental conditions. Most often, the “mean correct RT” is determined for the subset of correct responses that fall within some predefined response window, and a paired t test or an analysis of variance is used to test whether response speed in one condition differs from the other condition. The theoretical prediction usually corresponds to the alternative hypothesis, so that a significant difference in mean RTs is taken as support for the theory.
For the RMI (Inequality 6B), the situation differs from this standard scenario in several aspects: (A) The RMI does not refer to mean RT or mean correct RT within a specific time window, but to the RT distribution F(t), that is, the probability that a response occurs within time t under the respective experimental condition. (B) The RMI refers to RTs for three experimental conditions instead of two, and the inequality is asymmetric, with one term on the left hand side, and two terms on the right hand side of the inequality. (C) The RMI is defined for all t, so violations may occur at any t. (D) The race model describes a mechanism, namely, separate, context-invariant processing of the redundant information provided by the two channels. Violation of the RMI rules out this mechanism but does not automatically support a specific alternative cognitive architecture. As a consequence of these four aspects, the test of the RMI is somewhat special—with special requirements for study design, preprocessing of the data, and statistical testing as well as the interpretation of the results. Conversely, blind application of standard methods can lead to biased results and wrong conclusions.
In this section we describe the most commonly used method that has been proposed for testing the RMI in a group of participants (Miller, 1982). For tests of single-participant data, see Miller (1986) and Maris and Maris (2003). We then focus on whether and how the method is affected by Issues A–D mentioned above and suggest ways to improve study design, analysis, and interpretation of the results.
The RMI can be tested by applying paired t tests to compare the empirical cumulative distribution functions Ĝ(t) of RTs for A, V, and AV at prespecified time points or quantiles. While the RMI refers to the theoretical probabilities F(t), the observed proportion of responses below t is denoted by Ĝ(t). The test involves three steps (Ulrich, Miller, & Schröter, 2007): (1) Ĝ A(t), Ĝ V(t) and Ĝ AV(t) are determined as the number of RTs below or equal to t, divided by the number of trials. (2) Then the sum of the distributions for auditory and visual stimuli is computed, Ĝ A+V(t) = Ĝ A(t) + Ĝ V(t). (3) Each participant now provides a pair of proportions Ĝ AV(t) and Ĝ A+V(t) for a given t. For reasonably large numbers of trials, such proportions tend to Normal distributions, and so does their difference, therefore, the proportions can be compared using paired t tests. If Ĝ AV(t) is significantly above Ĝ A+V(t) at any t, the race model is rejected (“vertical” test; see Fig. 1 for an illustration at t = 200 ms).
Most often, a slightly more complicated procedure (“horizontal test”) is chosen that involves one additional step. Remember that the RMI states that F AV(t) is below F A(t) + F V(t), for all t. However, because the F(t) are monotonically increasing, the RMI equivalently states that F AV is to the right of F A + F V on its entire course (see Fig. 1 for an illustration). The latter statement refers to the inverse distributions (quantile functions). F and Ĝ are functions of time that return probabilities/proportions, for example, F V(200 ms) ≈ 20% in Fig. 1. The inverses F −1 and Ĝ −1 are functions of probabilities/proportions that return times, F V −1(20%) ≈ 200 ms. Therefore, Ĝ −1 AV(p) and Ĝ −1 A+V(p) are estimated for some prespecified set of p. Each participant then provides a pair of quantiles Ĝ −1 AV(p) and Ĝ −1 A+V(p) for a given p. If the sample size is large enough, the quantiles follow approximate Normal distributions (Bahadur, 1966), so that the pairs can be compared using paired t tests. If Ĝ −1 AV(p) is significantly to the left of Ĝ −1 A+V(p) at any p, the race model is rejected.
It is unclear whether the horizontal test or the vertical test is more powerful. Both variants can be adjusted to the response speed of the individual participants (see below), and other measures of violation have been suggested (e.g., Colonius & Diederich, 2006). Regardless of the specific procedure, the probabilities or time points at which the test is performed should be carefully selected. For example, Miller (1982) used a horizontal test with p = 5%, 15%, 25%, and so forth.
In the following, we discuss the prerequisites for a proper test of the RMI, and we outline solutions for some typical problems in experimental research with redundant signals. As already mentioned above, the test of the RMI is special in the senses that (A) it refers to RT distributions instead of mean RTs, (B) it is asymmetric with only one term on the left but two terms on the right, and (C) it includes significance tests at multiple time points or quantiles. We now turn to these aspects in more detail.
Estimation of distribution functions and quantiles
We first consider a scenario in which accuracy is 100%. When the time point t is held fixed at a certain value (e.g., t = 200 ms), the proportion Ĝ(t) of RTs below or equal to t (i.e., the number of RTs ≤ t divided by the number of trials n in the respective condition) is an unbiased estimate of the underlying probability F(t) = P(T ≤ t). It is, therefore, relatively easy to obtain the numerical estimates Ĝ A(t), Ĝ V(t), and Ĝ AV(t) for the different t. For the horizontal test, it is slightly more difficult to obtain unbiased percentile estimates Ĝ −1(p): basically, one needs to find the time t at which Ĝ(t) crosses a certain level p (see Hyndman & Fan, 1996, for an overview of software solutions). Bias might occur in small trial numbers, but simulations have shown that the bias is minimal and manifests only if the sample is exceptionally small (Kiesel, Miller, & Ulrich, 2007).
Errors and fast guesses
Inequality 8 is the so-called kill-the-twin correction (see Eriksen, 1988, for more details). Gondan and Heckel (2008) showed that Inequality 8 is not restricted to a two-state model of guess and response but also follows from a more general “deadline model” (Yellott, 1971) in which guess and response tendencies coexist in parallel. Because of the symmetry of Inequality 8, biases due to fast guesses cancel out, and the power to detect a violation of the RMI is increased (Miller & Lopes, 1991). For go/no-go and choice responses, the kill-the-twin correction looks slightly different (see Inequality 10).
Omitted responses and outliers
By default, omitted responses indicate that the participant did not detect the stimulus within some time limit. Data cleaning should be limited to those trials of an experiment in which the participant was distracted (this may include trials with omitted responses as well as trials with valid RTs). When estimating P(T ≤ t), omitted responses should be assigned an infinitely long RT (or any value above the maximum of the RT window). As a consequence, the distribution functions F(t) and their estimates Ĝ(t) do not necessarily tend to 0% and 100% at the boundaries of the RT window (see Fig. 1). This holds for Ĝ A(t), Ĝ V(t), Ĝ AV(t), and, of course, for the distribution of spontaneous responses in catch trials Ĝ C(t).
A permutation test for multiple time points
The RMI is predicted to hold for all t, so it may be violated at any t. Therefore, multiple t tests are used to test the RMI at different time points or quantiles, and the race model is rejected if the test is significant for at least one of them. Such a multiple test procedure requires some control of the significance level. Kiesel et al. (2007) showed that running multiple t tests results in a serious accumulation of the Type I error. On the other hand, a Bonferroni correction would be too conservative because the tests at the different time points are positively correlated.
with đ(t 0) = 1 / N × ∑ i d i (t 0) and se[đ(t 0)] denoting the mean and the empirical standard error of the differences d i (t 0) = Ĝ AV,i (t 0) + Ĝ C,i (t 0) − Ĝ A,i (t 0) − Ĝ V,i (t 0) for each participant i. At the boundary between the null hypothesis and the alternative hypothesis, the RMI turns into an equality, F AV(t 0) + F C(t 0) = F A(t 0) + F V(t 0), so that the left side and the right side are exchangeable. In other words, under the null hypothesis, the sign of d i (t 0) is random in each participant. This property is used in a permutation test to generate the distribution of the test statistic under the null hypothesis. In each simulation, the d i (t 0) are randomly signed, d i *(t 0) = ± d i (t 0), and the test statistic is calculated: T* = đ*(t 0) / se[đ*(t 0)]. This procedure is repeated, for example, 10,001 times, resulting in a distribution of 10,001 T* values simulated under the assumption of the race model. Similar to a standard one-sided significance test, the race model is rejected at α = 5% if the observed T is greater than 95% of the simulated T*.
Generalization of this permutation test to multiple time points is straightforward. We illustrate this for three time points, t 1, t 2, and t 3, resulting in three differences d i (t 1), d i (t 2), d i (t 3) for each participant i, as well as three test statistics T 1, T 2, T 3. In each simulation, a single multiplier, s i * = ±1, is chosen at random for each participant i and is multiplied with the difference values, d i *(t 1) = s i * × d i (t 1), d i *(t 2) = s i * × d i (t 2), and d i *(t 3) = s i * × d i (t 3). Separate T* statistics are then calculated for the three time points, and the largest of the three T* is chosen, T*max = max(T 1*, T 2*, T 3*). This is the so-called T max statistic. The race model is rejected at the one-sided significance level α = 5% if the maximum of the observed T values, T max = max(T 1, T 2, T 3), is greater than 95% of the simulated T*max values.
This permutation procedure is known to control the Type I error in correlated significance tests. It has been repeatedly used for multivariate significance tests, for example, to test for differences in EEG waveforms (Blair & Karniski, 1993). Gondan (2010, Table 1) simulated different scenarios to investigate the power of the permutation test. Because the tests at the different time points are highly correlated, the better control of the Type I error does not come at the price of a substantially reduced statistical power.
The time points t 1, t 2, t 3, and so forth can be adjusted to the participant’s individual response speed. To this end, the percentiles t 1,i = Ĝ i −1(5%), t 2,i = Ĝ i −1(10%), t 3,i = Ĝ i −1(15%), and so forth, are determined separately for each participant i, using the RTs of all conditions mixed together. The permutation test is then performed using the differences d i (t 1,i ), d i (t 2,i ), d i (t 3,i ), obtained at these individually determined time points. We recommend to test the RMI in the lower percentile range. For large t, the left-hand side of the RMI tends to 1, whereas the right-hand side tends to 2, so the RMI cannot be violated for large t. The Supplementary material to this article includes scripts written in R (R Core Team, 2015) which perform these computations at quantiles 5%, 10%, 15%, . . . , 45%. The choice of the specific quantiles is somehow arbitrary; whichever quantiles are chosen, they should be specified a priori. If the quantiles are chosen after inspection of the data, the significance level will not be protected.
Redundant signals experiments should be carefully designed to enable efficient analysis of the processing architecture. Of course, the experimental setup is determined by the primary research hypothesis. Here, we assume that the primary interest is whether the RMI holds or not, and we outline a minimalistic experimental setup with a simple response task that allows for a reliable and valid estimation of the respective RT distributions.
The stimuli of the different modalities should always be presented in random order; they should never be presented in separate blocks unless there are really good reasons for it. In nontechnical terms, the context invariance assumption (Eq. 4A) states that the participants “do the same” in all conditions. When stimuli of different modalities are presented in separate blocks, the participants are encouraged to do different things. For example, a participant might decide to close her eyes in a block with only A or AV stimuli in order to better “concentrate” on the auditory stimuli. The same participant would obviously follow another strategy in the block with the visual stimuli. This exaggerated, but plausible, example makes clear that blocked stimulus presentation almost surely leads to a violation of the context invariance assumption, which invalidates the whole derivation of the RMI. A similar problem arises in so-called pseudorandomized stimulus sequences: Many studies use stimulus sequences in which specific conditions are not presented more than three times in a row (e.g., mandatory shift of the modality after V—V—V). In such sequences it cannot be ruled out that motivated participants will become aware of this experimental strategy and prepare for modality shifts, which again violates the assumption of context invariance, and makes it difficult to interpret a violation or non-violation of the RMI.
On the other hand, random stimulus sequences might produce artificial redundancy gains. In random stimulus sequences, modality shifts are known to produce longer response latencies (e.g., Spence, Nicholls, & Driver, 2001). Because modality shifts affect only the unimodal conditions they might result in apparent coactivation because they do not occur in redundant stimuli. Therefore, several authors (e.g., Mozolic, Hugenschmidt, Pfeiffer, & Laurienti, 2008; Otto & Mamassian, 2012; Van der Stoep, Van der Stigchel, & Nijboer, 2015) presented sensitivity analyses that are based on the subset of stimuli that are free of modality shifts (e.g., A—A, V—V, AV—AV; e.g. Gondan, Lange, Rösler, & Röder, 2004), or stimuli that follow a specific modality (A—A, A—V, A—AV; e.g., Miller, 1986).
Minimizing fast guesses and controlling for them
In the standard redundant signals task with simple responses, there are three stimulus conditions (A, V, and AV). A response is executed on each trial, which induces a tendency to anticipate the stimuli. Therefore, the target stimuli should not be presented in a fixed rhythm. Instead, the prestimulus period should be random, with the duration preferably sampled from an exponential distribution (see Luce, 1986, for an excellent introduction to this topic). Many studies sample from a uniform distribution instead. A uniform distribution, however, raises expectations, especially at its end: If the duration of the prestimulus period is sampled from a uniform distribution between 1,000 and 2,000 ms, and a stimulus has not appeared for 1,900 ms, the participant can be sure that the stimulus will appear immediately and will be tempted to anticipate it. As a general rule, everything that makes the stimulus sequence foreseeable should be avoided unless the researcher is particularly interested in sequence effects and expectations.
Researchers should always consider using catch trials (i.e., trials with no stimulus, C) in redundant signals tasks with simple responses and randomly mix them in the stimulus sequence. Catch trials, by instruction, discourage participants from fast guesses. If fast guesses occur nevertheless, it is possible to estimate the guessing tendency from the “responses” to catch trials, that is Ĝ C(t), and use Inequality 8—thereby automatically applying the aforementioned kill-the-twin correction to eliminate guessing bias.
Power and sample size
This number is above z 0.95 with a probability of about 90%. The power estimate is slightly too optimistic (e.g., z approximation instead of the t distribution, absence of fast guesses, test at a single percentile instead of permutation test, assumption of a quite large RMI violation at the same quantile or time point in each participant), but it illustrates that redundant signals experiments should be undertaken with enough participants and enough trials per condition.
Whereas the above design aspects could be considered as mandatory, we now outline some ways to make the experiment more “interesting” from the researcher’s point of view. By manipulating the strength of the stimuli, their relative onset, and the task requirements, one can get insight into the mechanisms underlying the integration of redundant information.
Go/no-go and choice response tasks
In many redundant signals experiments, participants make simple responses to A or V stimuli or AV stimuli (e.g., Miller, 1986). The conclusions that can be drawn from such a basic experimental setup are, of course, limited (Miller, 1982). For example, researchers might wish to discern if redundancy gains indeed reflect some kind of “integration” of the response mappings at the decisional stage, as proposed by some coactivation models (e.g., Feintuch & Cohen, 2002). Alternatively, do redundancy gains merely result from a speeding of RTs because the increased stimulus energy with multiple stimuli leads to higher “arousal” (Nickerson, 1973; Posner, Nissen, & Klein, 1976)? The simple response task cannot distinguish these two underlying mechanisms.
To eliminate the possibility of generally higher arousal due to multiple stimuli in the redundant signal condition, some studies have used target (+) and nontarget (−) stimuli in each modality, yielding the four stimulus types A−V−, A+V−, A−V+, A+V+, with A−V− acting as the catch trial and A+V+ being the redundant stimulus (e.g., Gondan, Götze, & Greenlee, 2010; Gondan, Niederhaus, Rösler & Röder, 2005; Grice & Reed, 1992; Experiment 4 in Miller, 1982; Otto & Mamassian, 2012; Schröger & Widmann, 1998). The intensity of the overall stimulation can be kept constant across the experimental conditions. The redundancy gain corresponds to the difference between mean RTs for A+V+ and the mean RTs for A+V− and A−V+ alone. However, the interpretation of the redundancy gains is difficult because response conflicts may occur in A+V− and A−V+ (respond to A+ but withhold the response in V−) and may slow down the responses in these two conditions. Artificial redundancy gains may arise because A+V+ is obviously free of response conflicts (Fournier & Eriksen, 1990). For example, Grice and Reed (1992) reported slower responses for target stimuli that were accompanied by a nontarget distractor (A+V−), compared to single targets (A+).
An interesting experimental solution to this problem has been implemented by Grice, Canham, and Boroughs (1984). In addition to go- and no-go stimuli, Grice et al. introduced a neutral stimulus (0) in each modality, yielding three target combinations (A+V0, A+V+, A0V+), and three nontarget combinations (A−V0, A−V−, A0V−). Note that the neutral stimulus was always paired with either a target or a nontarget, so there were no ambiguous combinations such as A0, V0, or A0V0. When comparing A+V+ (redundant condition) to A+V0 and A0V+ (single target conditions), the intensity of the stimulation is kept constant in all conditions, and response conflicts in the unisensory conditions are avoided (e.g., Blurton et al., 2014; Feintuch & Cohen, 2002). For choice reaction tasks with two response alternatives, the design is identical, with the “−”-stimuli being associated with the second response alternative.
Omitting the kill-the-twin correction can again bias the test in favor of the race model.
Stimulus onset asynchrony manipulation
The expression F(t − 100) = P(T ≤ t − 100) = P(T + 100 ≤ t) implies that 100 ms are added to the RTs observed for A and C, consistent with the delayed onset of the auditory component in V100A (Miller, 1986) and with the shifted correction for fast guesses of the responses to unimodal auditory stimuli (for details, see Inequality 7 in Gondan & Heckel, 2008).
SOA manipulation is recommended for two reasons: If the processing times for the unimodal stimuli differ a lot, the redundancy gain might be very limited under any cognitive architecture, be it a race model or a coactivation model. If A is processed much faster than V, there is no reason to “wait” for the V stimulus in a redundant signals task. The RMI is not a powerful tool in such a situation. SOA variation might reestablish “physiological synchrony” (Hershenson, 1962), by equalizing the channel-specific processing times. Gondan (2009) provided a solution to the problem of multiple testing that arises from testing multiple SOAs. SOA variation can also be diagnostic regarding the mechanisms involved in coactivation (see Inequalities 13 and 14). Moreover, a number of coactivation models make testable predictions about the relationship between SOA and mean RT (e.g., Diederich, 1995; Colonius & Diederich, 2004; Gondan & Blurton, 2013; Miller & Ulrich, 2003; Schwarz, 1989, 1994; Ulrich & Miller, 1997).
Inequality 12 is easily seen to be a generalization of the RMI with kill-the-twin correction. Inequality 8 results when the intensity of the weak stimuli is zero. The generalization is important for two reasons: First, it allows for much more experimental flexibility because the basic setup (A, V, AV, catch trial) is being extended to a more realistic setting which includes only redundant stimuli. For example, in multisensory research, it is much more realistic to study the details of speech integration with audio-visual speech samples instead of comparing “natural” AV stimuli to “artificial” samples of A and V (e.g., Chandrasekaran, Lemus, Trubanova, Gondan, & Ghazanfar, 2011).
When the onset asynchrony is increased further and further, Inequality 13 converges again to the RMI with kill-the-twin correction (see Inequality 8). Again, predictions can be derived for the sign of the interaction contrast F AV(t) + F 100AV(t) − F A100V(t) − F V100A(t) for several processing architectures (serial, parallel, coactive; summarized in Gondan & Blurton, 2013).
This list is necessarily nonexhaustive and mainly reflects the authors’ experience and personal taste. Other very promising approaches exist that acquire response force, RTs for bimanual responses, and get important insights by manipulating the contingencies between the stimulus components (e.g., Diederich & Colonius, 1987; Giray & Ulrich, 1993; Miller, 1991; Mordkoff & Yantis, 1991, 1993).
When the race model fails . . .
We have shown above that the RMI is a direct consequence of a parallel, first-terminating model (i.e., race model) and the assumption of context invariance. Violation of Inequality 8 or Inequality 10 indicates that either the race model is wrong, the context invariance assumption is untenable, or both (see Miller, 2015, for a discussion). In many studies violation of the RMI is considered as evidence for the validity of “the coactivation model” which assumes integrated processing of the redundant information. Because of the great variety of coactivation models, this line of reasoning is incomplete as long as the specific mechanism of integration is not specified and tested.
Since Miller (1982), a number of coactivation models have been developed. Some of these models are very general (e.g., Miller, 1986, Equation 3) while others predict specific shapes of the RT distribution (e.g., Otto & Mamassian, 2012). In the following we list an incomplete set of candidates that might be considered if the race model has been shown to fail, and the researcher is interested in what happened instead. It is very likely that there is no universal coactivation model that accounts for the redundancy gains observed in all the studies of the field—studies that differ with respect to stimuli, task demands, and participants.
Violation of the upper bound defined by Inequality 14 rules out history-free coactivation models in favor of accumulator models in which sensory evidence is collected over time.
Specific examples of such accumulator models are the Poisson counter and diffusion superposition models proposed by Diederich (1995) and Schwarz (1989, 1994), and the parallel grains model (Miller & Ulrich, 2003). In the counter superposition model (Schwarz, 1989), buildup of sensory evidence is conceived as a series of impulses with independent, exponentially distributed intervals between two impulses. Detection occurs as soon as a criterion of c > 1 impulses has been registered. In bimodal targets, the impulses are pooled in a common channel (linear superposition). As a consequence, the criterion is reached earlier, which causes the redundancy gain. Schwarz (1989, Figure 1) showed that the model accurately predicts the mean RTs for speeded manual responses to auditory-visual stimuli with different onset asynchronies (see discussion in Diederich & Colonius, 1991; Schwarz, 1994). In the diffusion superposition model (Diederich, 1995; Schwarz, 1994), buildup of sensory evidence is conceived as a time-homogeneous diffusion process with drift. Again, detection occurs when a criterion c > 0 is reached for the first time. In bimodal stimuli, the activation of the two channels is pooled and the criterion is again reached earlier. Schwarz (1994) demonstrated that the diffusion superposition model accurately describes the means and variances of the RTs at different SOAs reported by Miller (1986). In the parallel grains model (Miller & Ulrich, 2003), buildup of sensory evidence is described by a discrete accumulation of information. Similar to Schwarz’ (1989) model, in auditory-visual stimuli, the contributions of both sensory channels are pooled, so that a criterion of c impulses is reached earlier than in single signals. Prediction of the mean RTs in the standard redundant signals experiment seems to be quite satisfactory (Miller & Ulrich, 2003, Figures 7 and 8).
A second class of coactivation models keeps the assumption of a race between channels (Eq. 1), but instead drops the notion of context invariance (Eq. 4A): For illustration, consider a minimalistic redundant signals task with 1/3 A stimuli, 1/3 V stimuli, and 1/3 AV stimuli. The prior probability for a stimulus in the auditory modality is then 2/3, because two out of the three experimental conditions include an auditory component. If, however, a visual stimulus is presented, the posterior probability for an auditory stimulus is decreased to 1/2, because one out of two conditions that include a visual component also includes an auditory component. Mordkoff and Yantis (1991) systematically manipulated such contingencies as well as contingencies of the response relevance of the different channels in a go/no-go task (e.g., with the auditory component predicting a target in the visual channel). The pattern of results indicated that the system adapts to such contingencies by excitatory interactions if the posterior probability is greater than the prior probability, and inhibitory interactions in the opposite case. Mordkoff and Yantis called the model an “interactive race model.”
In a recent approach, Otto and Mamassian (2012) investigated audio-visual redundancy gains with a special focus on sequence effects. Effects of the stimulus sequence have also been reported by Miller (1986, Table 4), Gondan et al. (2004) and Van der Stoep et al. (2015). Whereas Miller and Gondan et al. basically tested the RMI in subgroups of stimuli (e.g., only stimuli that follow an A trial; see above), Otto and Mamassian explicitly modeled the effects of the stimulus sequence. They then demonstrated that the race mechanism can explain the redundancy gain when allowing for increased noise in the system in bimodal stimuli. For further examples and taxonomies see Ulrich and Miller (1997), Colonius and Townsend (1997), and Townsend and Nozawa (1997).
Out of the 83 articles that tested a race model, 77 (93%) papers claimed to have obtained a violation of the RMI. Yet, out of the 83 articles reviewed, only 9 (11%) papers tested a coactivation model to describe and predict their data. Surprisingly, only 58 of these 83 studies (70%) did some kind of statistical analysis, mostly percentile-specific t tests, with 9 (10%) studies not correcting for multiple testing at all, and 23 studies (28%) using an overly conservative Bonferroni correction. Only 4 out of 83 studies (5%) used a permutation test that allows for an adequate Type I error control while preserving statistical power. In a surprisingly high number of studies (24 of 83), it was claimed that the race model inequality or “Miller inequality” was tested, but these studies actually used the so-called independent race model (Eq. 7), which results if the channel processing times are assumed to be stochastically independent. The shortcomings of this additional assumption were only discussed in 2 (3%) out of 83 of these studies.
None of the studies (0%) explicitly incorporated a procedure that would account for omitted responses in the distribution functions. Fifty-seven (69%) studies ignored misses by excluding them from the analyses. The other 26 studies (31%) did not mention how misses were treated in the analysis. Most articles had a priori defined cutoff boundaries for the RT distributions and fast guesses and misses were not accounted for in the distribution functions. When plots of the distribution functions were given, the plots were rescaled such that Ĝ(t) was 0% at the lower cutoff boundary and 100% at the upper cutoff boundary, which indicates that omitted responses and other problematic behavior were simply excluded. Only 7 out of the 83 studies (8%) did run a kill-the-twin procedure, which corrects the distribution functions from becoming artificially larger; in the other studies, the kill-the-twin procedure was not mentioned (and probably not performed).
Seventy-nine (95%) out of the 83 studies correctly implemented an intermixed presentation of their experimental stimuli (instead of presenting the visual, auditory, and audio-visual stimuli in separate blocks). Although exponentially distributed foreperiods have already been recommended by the seminal book on reaction times by Luce (1986), only three articles implemented this timing procedure (4%). Fifty studies (60%) sampled from uniform distributions instead, and 23 studies (28%) used a constant foreperiod.
This short review is intended to underline the need for careful study design and data analysis in research on redundant signals effects. Some of the aspects summarized in Fig. 2 can be considered to be specific to redundant signals tasks and to testing the RMI, but others are equally applicable to other RT experiments and analyses (e.g., random foreperiods and stimulus sequences, careful justification for data cleaning procedures routinely applied in RT research). Researchers should be aware that wrong or inefficient methodology may yield biased results (e.g., anticonservative decisions if issues of multiple testing are not taken serious, loss of power if the kill-the-twin correction is not performed) and may lead to wrong conclusions.
Hypothesis tests in RT studies in Experimental Psychology most often refer to differences in the mean RTs observed under two experimental conditions. Redundant signals tasks differ from this standard setup: (A) The test of the RMI refers to a comparison of three instead of two experimental conditions. (B) Hypotheses refer to cumulative RT distributions instead of mean RTs. (C) The violation of the RMI does not automatically support a specific theory.
As a consequence of (A), the assumptions of standard data cleaning procedures are not automatically met. There are experimental paradigms in which standard data cleaning procedures yield improved estimates and valid inference. In redundant signals tasks, matters are more complicated. Biases induced by errors and outliers cannot simply be removed by removing the errors and outliers from the analysis. Even in standard experiments comparing the “mean correct RT” between two experimental conditions, exclusion of fast responses, omitted responses, and outliers is often based on common-sense arguments and ad hoc considerations of statistical power (e.g., Ratcliff, 1993). When the null hypothesis states that “behavior” is equal under two experimental conditions, this statement is easily seen to translate to the hypothesis of equal “mean correct RTs.” In contrast, because of the asymmetry of the RMI and the probabilities involved, conventional error removal and other ad hoc data cleaning procedures may cause bias (Eriksen, 1988; Miller, 2004). Especially in experiments with simple reaction time tasks, it is advisable to include a condition with catch trials. Catch trials reduce the tendency for fast guesses and allow researchers to perform a kill-the-twin correction by applying Inequality 8. For go/no-go tasks or choice response tasks, corresponding kill-the-twin schemes can be developed that basically result in Inequality 10 (e.g., Blurton et al., 2014; Gondan et al., 2010). In general, errors, omissions and outliers are an integral part of RT experiments.
Problems of multiple testing arise as a consequence of (B) because the RMI is predicted to hold for all t, thus, violations might occur at any t. Although it is obvious that testing at multiple time points leads to an accumulation of Type I errors (see also Kiesel et al., 2007), many of the reviewed studies did not account for multiple testing. Other studies used standard procedures for Type I error control (i.e., the Bonferroni procedure), which assume the worst case of negatively correlated significance tests. For the test of the RMI, however, the tests at multiple time points or quantiles are positively correlated. If the test is significant at time point t, it is probably also significant in close proximity to t. Therefore, we recommend using the permutation test for the RMI (Gondan, 2010). A permutation test that also includes the kill-the-twin correction is provided in the Supplementary material to this article.
When the RMI fails, which mechanism can account for the observed results? A number of coactivation models have been proposed that can be broadly classified into context-invariant models that assume different mechanisms of integration (e.g., superposition models) and the so-called interactive race models that drop the assumption of context-invariance. The superposition models proposed by Diederich (1995), Schwarz (1989, 1994) and by Miller and Ulrich (2003, Section 6) are all based on additive superposition of activity in the two channels. In line with this, models of neural integration sites predict that integration of complex information is optimal if the signals arising from different sources are additively integrated (Denève, Latham, & Pouget, 2001). This is in remarkable contrast to the conventional definition of multisensory brain regions that underlines the special role of “superadditive” responses (e.g., Wallace, Meredith, & Stein, 1992; see Stevenson et al., 2014, for a discussion). Given that a number of phenomena can be explained with additive superposition, it seems worth considering them in future investigations of behavior and neural firing (e.g., Stanford, Quessy, & Stein, 2005).
Testing the RMI should follow the following recommendations: (A) The RMI should be tested at multiple time points. A permutation test should be used to control the significance level in these multiple tests. (B) Data cleaning should be avoided. Outliers and omitted responses should not be eliminated. Erroneous responses should not be eliminated from the data either. The kill-the-twin correction should be applied instead; in other words, Inequality 8 (simple response tasks) or 10 (go/no-go or choice response tasks) should be tested. (C) When the race model fails, coactivation models should be considered.
Strictly speaking, only these contaminated RTs distributions are observable in an experiment. To avoid cumbersome notation, the stars are omitted in the sections that follow.
Miller (1986) did not include a kill-the-twin correction, so his Inequality 3 does not include the F C term. Because the kill-the-twin correction can be shown to hold for a standard race model for asynchronous stimuli (Gondan & Heckel, 2008), and because in the exponential coactivation model F V100A(t) can only decrease in comparison to the standard race model, the correction term is justified.
We would like to thank Jeff Miller and an anonymous reviewer for helpful their comments and suggestions.
- Chandrasekaran, C., Lemus, L., Trubanova, A., Gondan, M. & Ghazanfar, A. A. (2011). Monkeys and humans share a common computation for face/voice integration. PLoS Computational Biology, 7, e1002165.Google Scholar
- Colonius, H., & Townsend, J. T. (1997). Activation-state representation of models for the redundant-signals-effect. In A. A. J. Marley (Ed.), Choice, decision and measurement (pp. 245–254). New York, NY: Erlbaum.Google Scholar
- Gondan, M. (2009). Testing the race inequality in redundant stimuli with variable onset asynchrony. Journal of Experimental Psychology: Human Perception and Performance, 35, 257–259.Google Scholar
- Hyndman, R. J., & Fan, Y. (1996). Sample quantiles in statistical packages. American Statistician, 50, 361–365.Google Scholar
- Luce, R. D. (1986). Response times: Their role in inferring mental organization. New York, NY: Oxford University Press.Google Scholar
- Miller, J. (2015). Statistical facilitation and the redundant signals effect: What are race and coactivation Models? Attention, Perception, & Psychophysics. doi: 10.3758/s13414-015-1017-z.
- Pesarin, F. (2001). Multivariate permutation tests with applications in biostatistics. New York, NY: Wiley.Google Scholar
- R Core Team. (2015). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.Google Scholar
- Stevenson, R. A., Ghose, D., Krueger Fister, J., Sarko, D. K., Altieri, N. A., Nidiffer, A. R., Kurela, L. R., Siemann, J. K., James, T. W., & Wallace (2014). Identifying and quantifying multisensory integration: a tutorial review. Brain Topography, 27, 707–730.Google Scholar
- Todd, J. W. (1912). Reaction to multiple stimuli. Archives of Psychology, 25, 1–65.Google Scholar
- Townsend, J. T., & Ashby, F. G. (1983). Stochastic modeling of elementary psychological processes. New York, NY: Cambridge University Press.Google Scholar