Over the past 5–6 decades, a number of different psychophysical paradigms have been used to assess human auditory temporal discrimination in young normal populations, and more recently, to compare them to other subpopulations—for instance, the elderly—or to a variety of pathological populations. Among these paradigms are duration discrimination of filled and empty intervals (Fitzgibbons & Gordon-Salant, 1994; Rammsayer & Leutner, 1996), gap detection (Schneider & Hamstra, 1999), temporal order judgments (TOJs) of monaurally (or binaurally) presented tones of different frequencies (Ben-Artzi, Fostick, & Babkoff, 2005), TOJs of the same stimulus presented to the two ears with differences in times of arrival of the stimulus at the two ears (i.e., dichotic TOJs; Ben-Artzi et al., 2005; Fink, Churan, & Wittmann, 2005, 2006; Fink, Ulbrich, Churan, & Wittmann, 2006; Fostick, Ben-Artzi, & Babkoff, 2011), and a variety of masking paradigms (Ramus, 2003; Rosen & Manganari, 2001).

One of the problems facing researchers who use psychophysical paradigms to compare young normal populations with other subpopulations is the degree to which the paradigms are sensitive to and vary with stimulus, methodological, and other parameters. To the extent that the paradigm used for comparison is invariant with respect to the above-noted parameters, the findings of differences among populations can be more confidently understood as reflecting basic differences in the neural processing associated with accurate performance.

Several of the paradigms used to test auditory temporal discrimination have been shown to be sensitive to stimulus and methodological parameters. As an example, gap detection thresholds have been shown by a number of researchers to be sensitive to (1) noise burst duration (He, Horwitz, Dubno, & Mills, 1999; Schneider & Hamstra, 1999), (2) noise burst intensity (Buus & Florentine, 1983; Plomp, 1964), and (3) proximity or distance of the gap from the stimulus onset (Snell & Hu, 1999).

Findings such as these argue for the necessity of having a broad knowledge base about the sensitivity of a paradigm to stimulus, methodological, and other parameters when comparing populations. Consequently, we wished to examine the extent to which the dichotic TOJ paradigm is influenced by the range of parameters generally used across different studies. In the present article, we report the effect of stimulus duration on performance of the dichotic TOJ task.

We have been using dichotic TOJs to compare the auditory temporal discrimination of young, normal, well-rested participants with that of other populations (see, e.g., Babkoff, Zukerman, Fostick, & Ben-Artzi, 2005; Ben-Artzi, Babkoff, & Fostick, 2011; Ben-Artzi et al., 2005; Fostick et al., 2011; Kleiner, Negbi, Or, Zuaretz, & Fostick, 2011) for a variety of reasons, most especially because the nature of a task involving the stimulation of two ears by the same stimuli, differing only in their temporal displacement, by definition mainly reflects central auditory processing (Ben-Artzi et al., 2005; Fostick et al., 2011).

Over the past two decades, a number of studies have used the dichotic TOJ paradigm to compare auditory temporal processing in various subpopulations. The reported estimates of dichotic TOJ thresholds in the control (young, normal participants) groups in these studies have varied (Table 1). The stimuli used in nine of the eleven studies published between 1991 and 2005 were transients, and therefore different from the tone stimuli that we used (e.g., Babkoff et al., 2005; Ben-Artzi et al., 2005). The methodologies used in these studies have also varied. Without correcting for the number of participants in each of the studies, the average threshold (the interstimulus interval [ISI] necessary for 75 % accuracy), expressed as the ISI from studies that have used click stimuli, is 45.72 msec with a standard deviation of 15.65 msec. The estimated mean dichotic TOJ thresholds for young, normal participants in the two studies that we published were 57.61 (Babkoff et al., 2005) and 49 msec (Ben-Artzi et al., 2005). The study by other authors that was most similar to one of ours, in terms of its stimuli and methodology (Ben-Artzi et al., 2005: 15-msec duration, 300- and 600-Hz tones), was that of Kolodziejczyk and Szelag (2008: 15-msec duration, 300-Hz tones), who reported an average (ISI) dichotic TOJ threshold of 37 msec. The latter authors did not publish standard deviations, but an estimate of their graphed values yields a standard deviation of around 20 msec. The uncorrected average reported dichotic TOJ threshold of all of the studies shown in Table 1 is 46.26 msec (±14.09). Most of the dichotic TOJ threshold values that we found in the literature are within one standard deviation of the overall uncorrected mean. Is this value a reasonable estimate of the overall dichotic TOJ threshold for all types of stimuli? Lacking any systematic reported data on the effect of stimulus variables on the dichotic TOJ threshold for tonal or noise-burst stimuli, we began by addressing the question: To what extent is the dichotic TOJ threshold affected by stimulus parameters?

Table 1 Dichotic TOJ thresholds reported in the literature

More specifically, in the present study, we addressed the following question: Do changes in ISI (the silent period between the offset of the tone at the leading ear and the onset of the second tone at the lagging ear) and the duration of the tones affect dichotic TOJ accuracy in the same or in different ways? In studies of temporal perception, stimulus duration was reported to affect the perception of stimulus onset time (Schimmel & Kohlrausch, 2008). Similarly, in studies of intermodality (audio–visual) stimulation, stimulus duration has been reported to impact judgments of temporal order (Boenke, Matthias, & Ohl, 2009). With the dichotic TOJ paradigm, one can increase the temporal delay between the onset of the stimulus at the leading ear and the onset of the stimulus at the lagging ear either by increasing the ISI without changing stimulus duration, or by maintaining the same ISI and increasing the duration of the stimuli. Although increasing the duration of the stimuli affects both the leading and the lagging tones, the temporal delay between the tones is primarily affected by the duration of the first tone.

Increasing ISI without increasing stimulus duration means increasing the silent period between stimulation of the two ears, while increasing tone duration and maintaining a constant ISI means increasing the total amount of stimulation at the two ears without changing the interstimulus silent period. Consequently, the two manipulations, ISI and tone duration, may affect dichotic TOJs differently. In the present study, we tested the effects of the two types of manipulations (ISI and tone duration) on the accuracy of dichotic TOJs.

Method

Participants

A group of 77 undergraduate students were recruited and randomly assigned to four groups in a between-subjects design to test tone duration (10, 20, 30, 40 msec). The data of seven participants were not included in the analysis because their performance accuracy at the longer tone durations was above 75 % even at ISI = 5 msec. The data of an additional five participants were not included in the analyses because their dichotic TOJ thresholds were more than two standard deviations longer than their respective group averages. Therefore, the analyses were carried out on 65 participants (age 21–25, 68 % females) with normal hearing. The participants were trained and tested individually in a single session.

Tasks, stimuli, and procedure

Dichotic TOJs

On each trial, two 1-kHz, 60-dB SPL pure tones were presented, with 1-msec rise/fall times. The tones were presented one to each ear, separated by temporal intervals. The participants were required to press the appropriate key on the computer keyboard corresponding to the order of stimulation between the ears (i.e., the temporal order of the tones: right–left or left–right). The tones were separated by ISIs (time between the offset of the first stimulus presented to the “lead” ear and the onset of the second stimulus presented to the “lag” ear) of 5, 10, 15, 30, 60, 90, 120, or 240 msec. The ISI values were randomized and repeated 16 times, producing 256 pairings (8 ISIs × 2 orders × 16 repetitions). After every 32 trials, participants received a short break. Each of the four groups of participants was tested with one of the four tone durations: 10, 20, 30, and 40 msec.

Training

To familiarize the participants with the tones, they were first presented with five samples of the tone to one ear, then five samples of the tone to the other ear. Training then proceeded with 24 trials of single stimuli, 12 stimuli being randomly presented to each ear. The participant was required to identify the ear to which the tone had been presented by pressing the appropriate key. Visual feedback (“right”/“wrong”) was provided for each response. In the last stage of stimulus familiarization, the stimuli were presented in random order, with no feedback, until the participant met the criterion of 20 correct responses in 24 consecutive trials. After the familiarization phase, participants were presented with pairs of tones in two orders, left–right and right–left, with an ISIs of 240 and 60 msec. Each of the ISIs was repeated 16 times, randomly intermixed with the other, resulting in 64 tone pairs. Participants were to identify the order of tone presentation by pressing the key indicating the location of the first tone, followed by the key indicating the location of the second tone. Visual feedback (“right”/“wrong”) was provided for each response during training. No feedback was provided during the experimental session.

Prior to the experiment, participants had signed informed consent for participating in the study. The study was approved by the Bar-Ilan University ethics committee.

Apparatus

The TOJ task was presented via a Pentium personal computer that controlled the stimulus presentation and recorded responses. Auditory stimuli were generated by a sound-generator device (TDT-system II: Tucker-Davis Technologies, Gainesville, FL), and presented to one ear and, following a time delay, to the other ear through TDH- 49 headphones. The task was programmed using MATLAB software, version 6.5. Speech perception was presented using a digital compact disk player with SONY’s Digital Reference MDR-CD770 earphones.

Results

Accuracy: ISI

The accuracy data were transformed by probit (transformation for linearizing sigmoid distributions of proportions; Armitage & Berry, 1994). Three-way repeated measures analyses of variance (ANOVAs) were performed with the probit-transformed data as the dependent variable, ISI (i.e., the silent time between the offset of the tone at the lead ear to the onset of the tone at the lag ear) and leading ear as within-subjects variables, and tone duration as a between-subjects variable. The analysis revealed a main effect of ISI [F(7, 434) = 205.89, p < .001, partial eta-squared (η p 2) = .77], but no effect of leading ear [F(1, 62) = 0.002, p > .05, η p 2 = .00] or stimulus duration [F(3, 62) = 1.08, p > .05, η p 2 = .05], nor any significant interactions: ISI × Ear [F(7, 434) = 0.94, p > .05, η p 2 = .02], ISI × Stimulus Duration [F(21, 434) = 1.12, p > .05, η p 2 = .05], Ear × Stimulus Duration [F(3, 62) = 0.73, p > .05, η p 2 = .03], or ISI × Ear × Stimulus Duration [F(21, 434) = 1.12, p > .05, η p 2 = 1.12)]. In addition, the points of subjective equivalence (PSE) were calculated on the probit-transformed data for each leading ear separately. A two-way repeated measures ANOVA with leading ear as a within-subjects variable and tone duration as a between-subjects variable revealed no effects of leading ear [F(1, 62) = 2.3, p > .05, η p 2 = .04] or tone duration [F(3, 62) = 0.65, p > .05, η p 2 = .03], nor a Leading Ear × Tone Duration interaction [F(3, 63) = 0.88, p > .05, η p 2 = .04]. The absence of a leading ear effect and of any difference in PSEs between the leading ears implies that no significant ear bias was involved in reporting the order of stimulation.

In Fig. 1, the proportions of left-leading responses (dichotic TOJ accuracy) are plotted as a function of ISI (beginning with right ear leading, on the left of the abscissa) for the four tone durations. The PSE to left-leading responses of 50 % was calculated on the probit-transformed data using a linear equation. The PSE was not different from 0 msec [F(1, 62) = 0.04, p > .05, η p 2 = .001] for any of the stimulus durations [F(3, 62) = 0.60, p > .05, η p 2 = .03].

Fig. 1
figure 1

Proportions of “left-leading” probit-transformed responses, in terms of ISIs, for the four tone durations

Because of the absence of any leading-ear effect, we were able to combine the right-ear-leading and left-ear-leading data for further analyses. Mixed-model repeated measures ANOVAs with ISI as a within-subjects variable and stimulus duration as a between-subjects variable revealed significant main effects of both ISI [F(7, 434) = 383.8, p < .001, η p 2 = .86] and stimulus duration [F(1, 62) = 5.32, p < .01, η p 2 = .21], as well as an ISI × Stimulus Duration interaction [F(21, 434) = 3.88, p < .001, η p 2 = .16]. Post-hoc ANOVAs revealed significant effects of stimulus duration at the short ISIs [5 msec, F(3, 62) = 6.43, p < .01, η p 2 = .24; 10 msec, F(3, 62) = 4.74, p < .01, η p 2 = .19; 15 msec, F(3, 62) = 11.45, p < .001, η p 2 = .36; 30 msec, F(3, 62) = 5.07, p < .01, η p 2 = .20], but not at the longer ISIs (60, 90, 120, and 240 msec; ps > .05).

We fitted a single linear equation to the probit-transformed accuracy data across all four tone durations. The results indicated that when a single linear equation was used to model dichotic TOJ accuracy as a function of ISI, for the four tone durations ranging from 10 to 40 msec, R 2 was 88 %.

Accuracy: SOA

Figure 2 is a scattergram plot of the probit-transformed accuracy data when tone duration and ISI are incorporated into one measure, SOA. This measure represents the total delay between tone onset at one ear and tone onset at the other ear, rather than the delay between the offset of the stimulus to the first ear and the onset of the stimulus at the lag ear. For the transformed SOA data, the PSE was 0 msec, similar to PSE obtained from the ISI data for all tone durations. When the transformed SOA data were fitted by a single linear equation, R 2 was equal to 94 %. The amount of variance accounted for by the SOA-transformed data was significantly greater than the amount of variance accounted for when accuracy was plotted as a function of ISI over four durations (R 2 = 88 %; Fisher r-to-z transformation, z = –2.02, p < .05).

Fig. 2
figure 2

Proportions of “left-leading” probit-transformed responses, in terms of SOAs, for all tone durations

Dichotic TOJ thresholds

Dichotic TOJ thresholds were defined as the ISI necessary for 75 % accuracy, estimated using a linear function. These thresholds are plotted in Fig. 3 as a scattergram for all of the participants at all tone durations. The group mean thresholds are also plotted (Fig. 3). The best linear fit to the empirical data (scattergram) is also drawn (straight line), which represents a decrease of approximately 8.6 msec in ISI per 10-msec increase in tone duration (R 2 = .188, p < .001). We tested the group mean data against a model that predicted a 10-msec reduction in threshold per 10-msec increase in duration (also shown in Fig. 3, as a dotted line) and found that the data did not deviate significantly from this model (probit analysis, Z = 2.84, p < .01). Note the four group means either fall on or deviate considerably less than 0.5 of a standard deviation from the dotted line. The additional data points plotted in Fig. 3 (noncircle geometric shapes) are explained below in the Discussion section

Fig. 3
figure 3

Distribution of dichotic TOJ thresholds as interstimulus intervals (ISIs) in the present study as a function of stimulus duration, plotted as a scattergram, together with the individual means and standard deviations. The best linear fit to the empirical data is drawn (straight line), as well as a dotted line representing the theoretical equation of a 10-msec decrease in interstimulus interval (ISI) per 10-msec increase in duration. The mean from Babkoff et al. (2005, diamond), the mean and standard deviation from Ben-Artzi et al. (2005, square), and the mean from Kolodziejczyk and Szelag (2008, triangle) are shown, for comparison

In Fig. 4, dichotic TOJ thresholds are plotted in terms of SOA, as a function of tone duration. Note that the scattergram data and the four duration group averages fall very close to or on the zero-slope line (dotted line). The point at which the average dichotic TOJ thresholds (as SOAs) crosses the vertical axis in Fig. 4 is approximately 56.78 ± 19.64 msec (probit analysis, Z = 8.09, p < .001). The additional data points plotted in Fig. 3 are explained below in the Discussion section.

Fig. 4
figure 4

Distribution of dichotic TOJ thresholds as stimulus onset asynchronies (SOAs) as a function of stimulus duration

Discussion

The reported estimates of dichotic TOJ thresholds for young, normal individuals, expressed as the ISIs to achieve 75 % accuracy, have varied across a number of studies. In the present study, we examined and compared the effects of increasing ISI and tone duration, within the 10–40 msec range, on dichotic TOJ accuracy. The results indicated that increases in either ISI or tone duration increased dichotic TOJ accuracy similarly.

We posed the question of whether ISI and tone duration improve dichotic TOJ accuracy in the same or in different ways. On the one hand, both ISI and tone duration share the property of separating the onsets of the stimulus at the “lag” ear and the stimulus at the “lead” ear, thus leading to longer intervals between the onset of stimulation at the two ears. Viewed in this way, both manipulations may impact dichotic TOJs in the same way. On the other hand, manipulation of the two parameters differs, since increasing ISI means increasing the interstimulus silent period (the separation) between stimulation of the two ears, while increasing tone duration means increasing the amount of stimulation at the two ears, without affecting the interstimulus silent period. Viewed in this way, the two manipulations of ISI and tone duration may affect dichotic TOJs differently. Although increasing tone duration affects both the leading and lagging tones, we posited that if increasing tone duration affects dichotic TOJs exactly as ISI does—that is, by increasing the separation between the tone at the lead ear and the tone at the lag ear—the time separating the onset of the tone at the lead ear and the onset of the tone at the lag ear (i.e., the SOA) would be a more parsimonious predictor of dichotic TOJ accuracy than ISI. The results of the analysis in which we compared the amounts of variance accounted for by the two procedures verified this hypothesis.

The proposed model that tone duration impacts dichotic TOJ exactly as ISI does also leads to the hypothesis that dichotic TOJ thresholds expressed as ISI should decrease as a linear function with a slope of 1 as a function of increases in tone duration. Thus, dichotic TOJ thresholds (expressed as ISI) should decrease 10 msec for every 10-msec increase in duration. When we plotted the average (group) data for the four tone durations as a function of tone duration, the data either fell on or deviated considerably less than 0.5 of a standard deviation from the predicted line, suggesting that ISI thresholds can be fitted by a linear slope that reduces by 10 msec for each 10-msec increase in stimulus duration (see Fig. 3).

Figure 3 also includes the mean dichotic TOJ thresholds from three earlier studies, for comparison: those of Babkoff et al. (2005, diamond), Ben-Artzi et al. (2005, square), and Kolodziejczyk and Szelag (2008, triangle). Babkoff et al. used a 10-msec duration with 1- and 1.8-kHz tones; Ben-Artzi et al. (2005) used a 15-msec duration with 300- and 600-Hz tones; and Kolodziejczyk and Szelag used a 15-msec duration with 300-Hz tones. The average dichotic TOJ thresholds of the three data sets are within ±1 standard deviation from the projected interpolated points on the line representing a linear decrease of 10 msec in the dichotic TOJ (i.e., ISI) threshold for every 10-msec increase in tone duration.

The hypothesis that dichotic TOJ thresholds should decrease at exactly the same rate that tone duration increases also predicts that if the threshold data were converted to SOAs, there should be no significant change in thresholds over the entire range of duration increase. The data of the present study and the data from the earlier studies (Babkoff et al., 2005; Ben-Artzi et al., 2005; Kolodziejczyk & Szelag, 2008) were then plotted as dichotic TOJ thresholds (as SOAs) in Fig. 4. All of the average data fall very close to or on the zero-slope line (Fig. 4). In the present study, the average dichotic TOJ thresholds (SOA) for tone durations ranging from 10 to 40 msec were 56.78 ± 19.64 msec. The three dichotic TOJ thresholds from the previous published studies, converted to SOAs, are 67.61 msec (Babkoff et al., 2005), 63.89 msec (Ben-Artzi et al., 2005), and 52.8 msec (Kolodziejczyk & Szelag, 2008) (see Table 1), all of which are less than ±0.5 of a standard deviation from the dichotic TOJ threshold assessed in the present study.

Although there have been no systematic studies of the effect of changes in stimulus duration on dichotic TOJ accuracy and thresholds, some studies have investigated the effects of stimulus duration on simultaneity and TOJs with intermodality (e.g., audio–visual) stimulation and estimation of stimulus onset time. For example, Boenke et al. (2009) reported a systematic shift in the point of subjective simultaneity toward a common attractor value with increasing stimulus duration. Schimmel and Kohlrausch (2008) reported that increasing the duration of a sound systematically shifted the perceived positioning of the onset of a temporally displaced target sound. In neither of the aforementioned studies, however, did an increase in duration have the exact same effect as an increase in the ISI. We believe that the reason for the difference between our findings regarding the equivalence of ISI and stimulus duration, on the one hand, and the intermodality studies that have not shown an equivalence of ISI and duration, on the other, is that none of the tone durations in the present study or in the earlier studies of dichotic TOJs involved overlap of the stimuli whose temporal order was being judged, in contrast to the intermodality TOJ paradigms, in which stimuli do overlap. The data of the present study and of earlier studies of dichotic TOJs indicate that, within the stimulus duration range of 40 msec, an increase in stimulus duration has the same effect on accuracy and threshold as does an equivalent increase in ISI, which leads us to conclude that the more parsimonious parameter is SOA, which treats stimulus duration and ISI as the same variable.

In summary, the results of the present study and those of three published studies suggest that dichotic TOJ thresholds, converted to SOAs, are invariant to changes in tone duration within the range of 10–40 msec. Although, we have not directly compared equal-duration tones of different frequencies and cannot conclude anything regarding the effect of tone frequency on dichotic TOJ thresholds, we note (Table 1) that the tone frequencies in the three previous studies differed from the one in the present study (1 and 1.5 kHz in Babkoff et al., 2005; 300 and 600 Hz in Ben-Artzi et al., 2005; and 300 Hz in Kolodziejczyk & Szelag, 2008). We cautiously suggest that, since the data from all of the studies seem to fall on the same dichotic TOJ threshold–duration line within less than a standard deviation, that tone frequency within the range of 300–1500 Hz may also not affect dichotic TOJ threshold. Understandably, the latter suggestion awaits more systematic study.