INTRODUCTION

In an enclosed environment, the signal generated from a sound source reaches the listener both through a direct path and from multiple reflections off the room's surfaces. Although the listener receives reflections from different locations, the auditory system is generally able to localize the sound source rather accurately by suppressing the directional cues carried by the numerous reflections. The perceptual phenomenon of dominance of the directional information contained in the first arriving sound is known as the precedence effect (PE) (Wallach et al. 1949; Zurek 1987).

This natural situation of a direct sound followed by multiple reflections can be simplified by considering a direct sound with a single reflection. The direct sound (lead) and its reflection (lag) can be reproduced in the free field by two loudspeakers at different locations, driven with identical click stimuli with a delay between the onsets (lead–lag delay or inter-click interval (ICI)).

The perception of the lead–lag pair depends on the ICI and varies both in the number of perceived stimuli and in their perceived location. Although this variation is gradual and stimulus dependent, some approximate ranges of perception can be defined: a summing window, a precedence window, and an echo window (Fitzpatrick et al. 1999; Litovsky et al. 1999).

The summing window is defined by an ICI range between 0 and 1 ms (e.g., Litovsky et al. 1999), where the lead and the lag are perceptually fused in one single image and contribute both to the perceived localization of the fused event. The precedence window is defined by an ICI range from 1 ms up to the echo threshold (Fitzpatrick et al. 1999; Litovsky et al. 1999). Here, the percept is a fused event localized at the lead location. For this time range, the directional cues contained in the lag are weighted less heavily than those of the lead (Wallach et al. 1949; Litovsky et al. 1999). The echo window refers to the ICI range above the echo threshold, where the lead and the lag are audible as two separated sound images, each perceived at its own location (Blauert 1997). The echo threshold estimates the ICI at which the fused auditory event perceptually splits into two sound images. For clicks, the echo threshold occurs at ICIs of 2–10 ms (Freyman et al. 1991; Yang and Grantham 1997b; Litovsky et al. 1999), and studies using headphones generally observe smaller values (2–4 ms) than those using loudspeakers (Fitzpatrick et al. 1999; Litovsky et al. 1999).

Although the PE has been intensively studied over the last two decades (Lindemann 1986; Divenyi and Blauert 1987; Freyman et al. 1991; Fitzpatrick et al. 1995; Litovsky and Yin 1998; Fitzpatrick et al. 1999; Liebenthal and Pratt 1999; Hartung and Trahiotis 2001; Damaschke et al. 2005; Xia and Shinn-Cunningham 2011), the debate whether the lag-suppression mechanism results from peripheral or central processes has remained unresolved. Previous studies have suggested the existence of monaural and peripheral mechanisms responsible for a reduction in the sensitivity to the spatial cues contained in the lagging stimulus (Tollin 1998; Tollin and Henning 1998, 1999; Hartung and Trahiotis 2001; Wolf et al. 2010; Xia and Shinn-Cunningham 2011). However, these studies either consisted of solely psychoacoustical experiments (Tollin and Henning 1998, 1999), a test of computational models against psychoacoustical results (Tollin 1998; Hartung and Trahiotis 2001; Xia and Shinn-Cunningham 2011) or physiological findings in animals (Wolf et al. 2010). Monaural neural correlates of lag suppression were also reported by Wickesberg and Oertel (1990), Fitzpatrick et al. (1995), Parham et al. (1996), Fitzpatrick et al. (1999), and Tollin et al. (2004).

The current study investigated contributions to the PE at different stages along the auditory pathway, whereby comparisons between psychoacoustical and physiological data were analyzed in the same human listeners. Three psychoacoustical experiments, a fusion task, an interaural time difference (ITD) detection task, and a lateralization task were performed to investigate the perceptual phenomena related to the PE. Furthermore, noninvasive physiological methods, click-evoked otoacoustic emissions (CEOAEs), and auditory-evoked brainstem responses (ABRs), were used to systematically examine the effect of the leading click on the lagging click at cochlear and brainstem levels and to experimentally test the hypothesis of a peripheral source of the PE.

METHOD

Six normal-hearing subjects (three females and three males), aged from 24 to 34, participated in the experiments. All had audibility thresholds of less than 20 dB hearing level at the frequencies in a standard audiogram. The experiments took place in a double-walled soundproof booth that was electrically shielded for the CEOAE and ABR experiments. All signals were generated digitally in MATLAB at a sampling rate of 48 kHz and consisted of 83 μs clicks.

Psychoacoustical experiments

The psychoacoustical experiments investigated two perceptual phenomena that characterize the perception of the lead–lag pair in the precedence window (Litovsky et al. 1999): fusion, which refers to the perception of one single, fused auditory event and lag-discrimination suppression, which refers to the difficulty of the listener to discriminate directional information contained in the lag.

The stimuli, consisting of lead–lag click pairs of the type presented in Figure 1A, were presented over headphones (Sennheiser HD580) using a D/A converter (type RME DIGI96/8 PAD). The lead–lag pairs were presented at 75 dB peak equivalent sound pressure level (peSPL) and had ICIs of 1, 2, 3, 4, 5, and 8 ms. Two stimulus conditions were considered: a reference condition (ITD = 0, lead and lag perceived at the center of the head; Fig. 1A left) and a deviant condition (lag-ITD > 0, lag lateralized towards the left; Fig. 1A right).

FIG. 1
figure 1

A Schematic stimulus configurations used in the behavioral experiments: reference and deviant. The reference configuration consists of two diotic click pairs (ITD = 0), delayed by an inter-click interval (ICI). In the deviant configuration, the lead is represented as a diotic click pair (lead-ITD = 0) and the lag as a dichotic click pair (lag-ITD > 0). B Interleaved stimulus presentation used in the CEOAE experiment. Three configurations (SC single click; DC double click; DCI double-click inverted) were repeated 1,800 times within a sequence for each ICI condition and for an ITD of 300 μs. C Stimulus presentation for the ABR experiment. A deviant configuration was repeated 2,000 times, for each ICI condition and for an ITD of 300 μs. The ABRs were recorded by using four electrodes: Fz (ground, positioned at the forehead), Cz (reference, positioned at the vertex), and M1 and M2 (left and right mastoids).

Fusion test

An adaptive one-interval, two-alternative forced-choice (2 AFC) procedure was adopted to determine the echo threshold, i.e., the ICI for which the deviant was perceived as two separate clicks. Each presentation consisted of a deviant with a lag-ITD of 300 μs, for which the ICI was varied between 1 and 7 ms. The test was carried out both for monaural and binaural stimulation to investigate the contribution of binaural processing to fusion. In both tests, the subjects' task was to specify whether they perceived a single click (SC; fused image) or two separated clicks (lead and lag). The subjects were instructed to press the two-click response only when they could hear two auditory events clearly separated in time (monaural test) or in space (binaural test). The starting value of the ICI was 1 ms, which was increased after each single-click response and was decreased after two consecutive two-click responses. The initial step size was 1 ms and reduced after a lower reversal to 0.5 and 0.3 ms as the threshold was approached. The echo threshold was obtained after six reversals and corresponded to the 70.7 % point on the psychometric function. Thresholds were obtained as the average of three repeated measurements.

ITD-detection test

This test investigated lag-discrimination suppression by studying lag-ITD detection as a function of ICI. Seven sequences containing references and deviants were presented, one for each of the following ICIs: 0, 1, 2, 3, 4, 5, and 8 ms. Within each sequence, the ICI was constant and the deviants were randomly presented among the references, allowing a minimum of three references between the presentation of two deviants (Damaschke et al. 2005). The deviants contained ITDs ranging from 150 to 900 μs with a step size of 150 μs. Each ITD was repeated three times within the same sequence for a total of 18 deviants per sequence (six lag-ITDs repeated three times). The interval between the onset of one lead–lag pair and the onset of the following pair was 1 s. The subjects' task was to hit a button on the keyboard whenever a noncentered click pair (i.e., a deviant) was detected among the centered references. The response was considered correct when the button was pressed within 1 s after the presentation of the deviant. False alarms were accounted for by calculating the ratio between the number of correct hits and the total hits for each sequence. Subjects were asked to repeat those sequences where the ratio was below 70 %. The ITD-detection threshold was calculated as the lag-ITD that corresponded to 67 % correct performance, i.e., when the lag-ITD was correctly detected at least two times out of three for each sequence.

Lateralization test

The stimulus presentation consisted of one interval containing two lead–lag pairs: a reference followed by a deviant. The reference and deviant had the same ICI, with values among: 0, 1, 2, 3, 4, 5, or 8 ms. The deviant contained a lag-ITD in the right channel, which was randomly varied among: 0, 150, 300, 450, 600, 750, 900, and 1,000 μs. Each ITD was repeated three times for each ICI. After the presentation of each reference-deviant pair, subjects were asked to press one of the six response buttons ((1) left, (2) center, (3) center and center, (4) center and right, (5) center and left, and (6) center and left and right) according to the perceived lateralization of the deviant with respect to the reference. The six buttons were designed to take all possible percepts of the deviant into account, both when fusion occurred and when fusion was no longer present. In the case of a fused percept, a SC was perceived, either to the left (when the ITD was detected) or at the center. Otherwise, lead and lag were perceived as two separate clicks, where the lead was always perceived as centered, and the lag was perceived either at the center, left, or right, or as two clicks to the left and to the right. Although the lag ITD was leading to the left ear, the percepts of the lag either to the right, or to the left and right, were included to account for the possibility of different monaural suppressions of the lagging clicks in the left and right ear (e.g., for large ITDs). The lateralization threshold was calculated for each ICI as the minimum ITD producing at least two times out of three (67 %) a noncentered percept of the deviant.

CEOAE recordings

The stimuli were sent via the open source software pa-wavplay to the soundcard (RME FireFace 800 A/D-D/A converter, RME Intelligent Audio Solutions, Germany). The clicks were calibrated at a level of 65 dB peSPL in a BK-2012 ear-canal coupler (Brüel & Kjær Sound & Vibration Measurement A/S, Denmark), attached to a BK-4157 artificial ear. After insertion of the recording probe in the ear canal, in situ calibration was performed using a TDT-PA5 programmable attenuator (Tucker-Davis Technologies, Alachua, FL) to ensure that the levels of the clicks in the ear canal were equal in each ear. The stimuli were presented to the left and right ear of the test subjects via two ER-2 earphones (Etymotic Research, Inc., Elk Grove Village, IL). Recordings were performed using two ER-10B+ low-noise microphones and were bandpass filtered between 0.6 and 5 kHz (analog Rockland 852 HI/LO filter). Click pair stimuli were designed for seven different ICIs (0, 1, 2, 3, 4, 5, and 8 ms) and a lag-ITD of 300 μs.

The response recorded to the double-click stimulus consists of a CEOAE to the lead click, a CEOAE to the lag click, and a nonlinear component that depends on the ICI (Verhulst et al. 2011a). Kemp and Chum (1980a) developed a technique to remove the CEOAE component from the leading click while keeping the CEOAE component to the lagging click and the nonlinear component due to the ICI. This technique, as adapted by Kapadia and Lutman (2000b), was used here to calculate the derived suppressed (DS) response of the lagging click. Figure 1B illustrates this interleaved procedure adopted for stimulus presentation (Verhulst et al. 2011a). For each ICI and ITD condition, 1,800 repetitions of the following three stimuli were presented: SC, double click (DC; two condensation clicks), and double-click inverted (DCI; one condensation and one rarefaction click). The unsuppressed response (US) corresponded to the SC recordings. The DS response was obtained by subtracting the DCI response from the DC response and by halving the result. The DS response thus consisted of the CEOAE component due to the lagging click and the nonlinear component due to the ICI. The lag suppression was calculated as the root-mean-square (rms) level difference between DS and US responses in a time frame of 6–18 ms after click onset. Both monaural and binaural stimulations were tested. As no difference in lag suppression level was found between the two stimulations, it was decided to present the stimuli binaurally to extract monaural CEOAE lag suppression.

ABR recordings

The electrodes were placed according to the 10–10 system (American Clinical Neurophysiology Society), using a tight-fitting elastic cap that holds the electrodes in position (Picton 2011). Four electrodes were used: Cz (at the vertex, halfway between nasion and inion), Fz (at the forehead at three tenths of nasion–inion distance), M1 (left mastoid), and M2 (right mastoid). The electrode Cz was used as a reference and the electrode Fz as ground. Low impedances (below 2 kΩ) were achieved by carefully degreasing the test subject's scalp with alcohol and an abrasive electrolyte gel. The stimuli were played back and sent to the soundcard (RME FireFace 800 D/A converter, RME Intelligent Audio Solutions, Germany). The clicks were calibrated at a level of 75 dB peSPL in a BK-2012 ear-canal coupler (Brüel & Kjær Sound & Vibration Measurement A/S, Denmark), attached to a BK-4157 artificial-ear calibrator. The stimuli were presented to the left and right ear of the test subjects via two ER-2 earphones (Etymotic Research, Inc., Elk Grove Village, IL). The electrodes were connected to an EEG amplifier (Synamps 5803), responsible for the amplification and A/D conversion of the recorded potentials. The output of the amplifier was connected to the recording PC where the EEG-data were post-processed. The average, variance, and covariance of the evoked responses were calculated, and the resulting waveform was bandpass filtered with a FIR filter with cut-off frequencies of 200 and 1,500 Hz. Deviants were presented for seven different ICIs (0, 1, 2, 3, 4, 5, and 8 ms) and a lag-ITD of 300 μs. For each ICI and ITD condition, the 25-ms-long epoch containing the deviant stimulus was presented 2,000 times (Fig. 1C).

In the data analysis, the wave V amplitude peaks of the lead were determined as the maximum voltage (absolute value) in a time range of 6.5–7.5 ms after stimulus onset (Damaschke et al. 2005). The wave V amplitude peaks of the lag were determined with a similar procedure, in a time range shifted in latency according to the ICI and the ITD.

Statistical analysis

CEOAE

The data obtained for the DS and US conditions were divided into five blocks of 360 averages each. Mean and rms level were calculated for each block and suppression was calculated for the 25 combinations of level difference between the DS and US conditions. The standard deviation (SD) was calculated over the 25 values of suppression (Verhulst et al. 2011a).

ABR

SDs of the ABR recordings were calculated as the square root of the time-averaged variances. Normal distributions were built from the mean and SD of the wave-Vs of lead and lag. A normal distribution of lag-wave V suppression and its SD were obtained by random sampling from the distributions of the lead and lag wave-Vs.

Confidence interval and significance testing

For each subject, a statistical analysis was carried out to investigate whether the CEOAE-derived and the ABR-derived lag suppression was significantly different below and above the individual echo thresholds (Table 1). For each subject, mean values of lag suppression below and above the echo threshold were calculated from all data points below and above the threshold, respectively. SDs of the mean lag suppression below and above the threshold were obtained by taking the square root of the summed variances, divided by the number of data points (Bienaymé formula). Two normal distributions for data below and above the echo threshold were built from the calculated mean and SDs, and 10,000 random samples were then drawn from each distribution. These two sets of random samples were subtracted to obtain an estimate of the difference distribution of lag suppression below vs. above the echo threshold, and 95 % confidence intervals (CIs) were calculated for these difference distributions. As the sample size of CEOAE and ABR recordings differed, a conservative approach was adopted such that the CIs were defined as the mean of each difference distributions ± 1.96 SD. Significance testing was carried out by controlling whether the CIs contained zero. CIs that did not contain zero (asterisks in Table 1) indicated that lag suppression was significantly larger above than below the echo threshold. The indicated p values were calculated using the z statistic as p = exp(−0.717 · z − 0.416 · z 2) (Altman and Bland 2011).

TABLE 1 Mean lag suppression and standard deviation (in decibels), calculated for each subject, for ICIs below and above the individual echo thresholds

RESULTS

Psychoacoustical experiments

The individual and mean results of the fusion test are presented in Figure 2A. The figure shows the ICIs for which fusion occurred, both for monaural (monaural left, blue bar; monaural right, red bar) and binaural stimulation (black bar). The breakdown of fusion corresponds to the echo threshold. The mean results show similar echo thresholds for binaural stimulation (4.6 ms) and monaural left stimulation (4.5 ms). For monaural right stimulation, a lower value of 4 ms was observed, due to the additional delay of 300 μs introduced by the ITD. The similar values for the echo thresholds obtained in the monaural and binaural conditions suggest a fusion mechanism that does not depend on binaural processes. This is consistent with other studies where similar echo thresholds were found in the absence and presence of binaural cues (Rakerd et al. 1997) and for subjects with monaural deafness and normal-hearing subjects (Litovsky et al. 1997).

FIG. 2
figure 2

Psychoacoustical results. A Individual and mean results of the fusion test for binaural (black bars) and monaural stimulation (monaural right, red bars; monaural left, blue bars) by deviants with a lag-ITD of 300 μs. B Mean behavioral thresholds obtained from the lateralization test (circles) and ITD-detection test (squares). The error bars represent the standard error of the mean. C Lateralizations reported the most by the six subjects over three repetitions of the lateralization test (symbols) and mean lateralization threshold (black curve). The different markers represent the six response buttons (left, center, center and center, center and left, center and right, and center and left and right). The size of the symbols indicates at what percentage the lateralization was reported over 18 responses (six subjects, three repetitions): small symbols, below 50 %; medium symbols, between 50 and 70 %; large symbols, above 70 %.

Figure 2B presents the mean ITD-detection thresholds (squares) and lateralization thresholds (circles). The ITD-detection threshold, i.e., the minimum lag-ITD to obtain a noncentered percept of the deviant, increased up to 590 μs for ICIs between 0 and 4 ms, and then decreased again for ICIs above 4 ms. Large threshold values indicated strong lag-discrimination suppression. For an ICI of 0 ms, no lag-discrimination suppression occurred (i.e., lead and lag had the same weight in lateralization) and all subjects could detect the deviants at the shortest ITD presented (150 μs). For an ICI of 8 ms, the ITD threshold was 340 μs, which was significantly higher than the baseline threshold for an ICI of 0 ms (p < 0.05, two-sample right-tailed t test) and not significantly lower than the threshold at 5 ms (p = 0.074, two-sample right-tailed t test), indicating that lag-discrimination suppression was still present for a lead–lag delay of 8 ms (and ITDs below the threshold). The ITD-threshold obtained here showed an ICI range over which lag-discrimination suppression occurred that is in agreement with previous studies (Zurek 1980; Damaschke et al. 2005).

The lateralization test refined the ITD-detection test by specifying the lateralization of a lead–lag pair as a function of the ICI. The difference from the previous test was that the task in this experiment was not only to detect the ITD contained in the lead–lag pair, but also to specify the perceived lateralization of the lead–lag pair. For each subject, the threshold was calculated as the minimum ITD producing at least two (out of three) noncentered percepts of the deviant. Figure 2B shows the mean lateralization threshold (circles), where the error bars indicate the standard error of the mean. The lateralization threshold curve presented similar values as the ITD-detection threshold function for all ICIs except at 3 ms, where the lateralization threshold was significantly larger than the detection threshold (p = 0.029, Wilcoxon rank sum test). Largest thresholds were obtained for ICIs of 2–3 ms. For longer ICIs, the threshold curve decreased again, until reaching 300 μs for an ICI of 8 ms. Although not at baseline level (150 μs), this value was significantly lower than the threshold at 5 ms (p = 0.021, two-sample right-tailed t test).

In Figure 2C, the mean lateralization threshold (black curve) is represented together with the lateralizations that were reported the most by the six subjects. The different symbols represent the different response buttons, whereas the size of the symbols shows at what percentage the lateralization was reported over 18 responses (six subjects and three repetitions). Small symbols indicate the lateralizations that were reported less than nine times (i.e., below 50 %). Medium-sized and large symbols represent reported lateralizations corresponding to between 50 and 70 % and above 70 %, respectively. The black symbols indicate perception of the lead–lag pair at the lead location, i.e., when lag-discrimination suppression occurred. Colored symbols show the release from lag-discrimination suppression. Fused percepts are indicated by the squared symbols.

For an ICI of 0 ms, the blue squares show that lead and lag had the same weight in lateralization (i.e., summing location), as subjects reported to hear a SC towards the left more than 70 % of the times. For ICIs between 1 and 4 ms, lag-ITDs below 600 μs show a strong lag-discrimination suppression (black symbols), whereas ITDs above 600 μs indicate a release from lag-discrimination suppression (colored symbols), even though difficulties were reported in consistently lateralizing the lag (small symbols). For ICIs above 4 ms, the results for all ITDs indicated that lead and lag were no longer perceived as fused. Despite the breakdown of fusion, lag-discrimination suppression was still observed for ICIs of 5 and 8 ms at short ITDs (black diamonds). For large ITDs, the subjects reported to perceive a diffuse sound image inside the head (green circles).

In summary, the results from the three perceptual experiments estimated fusion to occur within an ICI range up to 4.6 ms, and lag-discrimination suppression to last for longer ICIs (at least up to 8 ms).

CEOAE

When the auditory system is stimulated by a click, the forward travelling wave created along the basilar membrane (BM) can be reflected by preexisting random BM impedance irregularities (Shera and Guinan 1999; Zweig and Shera 1995). These irregularities are inherent to a healthy cochlea and may reflect small cell-to-cell differences in outer-hair cell amplification and alignment, which can be thought of as place-fixed BM impedance irregularities. Through a mechanism of coherent reflection, the BM irregularities are assumed to give rise to a backwards traveling wave that can be recorded in the ear canal as a CEOAE (Zweig and Shera 1995). CEOAEs contain information about the BM processing at the cochlear regions where the emission was generated (Moleti et al. 2008; Shera et al. 2002). When the cochlea is stimulated with lead–lag pairs, both the lead and lag elicit a CEOAE. It has been shown that, when preceded by the lead, the CEOAE elicited by the lag is reduced in amplitude compared with a CEOAE elicited by the lag presented in isolation (Kapadia and Lutman 2000; Verhulst et al. 2011a). This CEOAE amplitude reduction, which depends on the lead–lag delay, presumably reflects attenuation of the BM response to the lagging click, and will be referred to as peripheral lag suppression in the following.

Figure 3A shows the spectra of the recorded CEOAEs for one representative subject KE. The spectrum represented in gray is the US, which is the emission elicited by the lag presented in isolation. The superimposed spectrum (white) is the DS response which represents the derived emission of the lag when preceded by the lead. The difference between US and DS (gray region) indicates peripheral lag suppression for three ICI conditions of 2 (left panel), 4 (middle panel), and 8 ms (right panel). The results show that lag suppression was maximal for an ICI of 2 ms and almost negligible for an ICI of 8 ms. Consistent with previous studies (Verhulst et al. (2013); Verhulst et al. 2011b), the figure also shows that the release of lag suppression first occurred at the highest frequencies (e.g., at 4 kHz for an ICI of 4 ms), and later at lower frequencies (e.g., at 2 kHz for an ICI of 8 ms). This frequency-dependent release of suppression as a function of ICI appears to be related to BM impulse response duration, where higher characteristic frequencies exhibit a shorter time range of impulse response lead–lag interactions. Thus, the peripheral lag suppression obtained from CEOAE recordings appears to reflect mechanical BM impulse response lead–lag interactions.

FIG. 3
figure 3

CEOAE results. A Spectra of the recorded CEOAEs for the single-click condition, i.e., the unsuppressed response (US), and for the derived suppressed response (DS, obtained from (DC-DCI)/2 in Fig. 1B) of the lagging click, for one representative subject KE. The difference between US and DS (the area displayed in gray) represents peripheral lag suppression for ICIs of 2, 4, 8 ms. B Individual (gray curves) and mean (black curves) results of peripheral lag suppression as a function of the ICI for monaural left and right stimulation. The error bars indicate the standard error of the mean.

In Figure 3B, peripheral lag suppression is represented as a function of ICI. The figure shows individual (gray curves) and mean data (black curves) of peripheral lag suppression for monaural left (left panel) and monaural right (right panel) stimulation, for lead–lag pairs with an ITD of 300 μs. The mean data show a large suppression of the lag (between 3 and 6 dB) for lead–lag delays up to 4 ms. Above an ICI of 4 ms, the mean peripheral lag suppression decreased to 2 dB at 5 ms and 0.5 dB at 8 ms.

A statistical analysis was conducted on the null hypothesis that the difference of individual suppression, calculated for ICIs below and above individual echo thresholds, was zero (95 % CI). All test subjects showed peripheral lag suppression that was significantly larger for ICIs below the individual echo threshold than above it (Table 1).

ABR

ABRs are auditory-evoked potentials that reflect synchronized neural activity generated at the level of the auditory nerve (AN) and the auditory brainstem. Wave V is typically the most prominent peak in the ABR and is considered to reflect activity stemming from the superior olivary complex in the brainstem (Picton 2011).

When stimulating with click pairs, both lead and lag typically elicit a wave V. If the lag suppression obtained in the CEOAEs indeed reflects BM lead–lag interactions, it is expected to obtain an analogue response reduction also in the ABR to monaural stimulation (i.e., in the lag-wave V amplitude). Figure 4A shows the ABR recordings of one representative subject (KE) to binaural stimulation (black curve, left panel) and monaural stimulations (blue and red curves, right panel), for an ITD of 300 μs. Wave V amplitude peaks are indicated by downward-pointing triangles. The results show that the leading click evoked a wave V that was constant in amplitude and latency for all ICIs, whereas wave V elicited by the lagging click was initially lower in amplitude for short ICIs and gradually increased in amplitude and latency as ICI increased. Figure 4B shows individual (gray curves) and mean (black, blue and red curves) lag-wave V reductions as a function of the ICI for monaural left (left panel, blue curve), monaural right (right panel, red curve) and binaural stimulation (left panel, black curve). The mean data show a lag-wave V reduction of up to 10 dB for lead–lag delays of 1 and 2 ms. The reduction obtained for binaural stimulation (black curve, left panel) was not larger than the reduction for monaural left stimulation (blue curve, left panel).

FIG. 4
figure 4

ABR results. A ABRs recordings for one representative subject KE, for monaural (right panel, red and blue curves) and binaural (left panel, black curve) stimulation and different ICI conditions. The error bars at a latency of 6 ms indicate the time-averaged SD of the recording. The horizontal dashed lines depict the zero voltage reference, and the bar scale at a latency of 16 ms indicates a voltage of 0.4 μV. B Individual (gray curves) and mean (black, blue, and red curves) results of lag wave V reduction obtained from ABRs recordings for monaural (gray, blue, and red curves) and binaural stimulation (black curve), as a function of ICI. The error bars indicate the standard error of the mean.

A comparison with the behavioral echo thresholds (Table 1; Fig. 5) revealed that all subjects showed a lag-wave V reduction that was larger for ICIs below the echo threshold than above it. This result was significant (analysis of 95 % CI of the difference distribution) for three out of six subjects for monaural right stimulation, for one subject for monaural left stimulation, and for three subjects for binaural stimulation (Table 1).

FIG. 5
figure 5

Comparison of mean lag suppression from OAEs (dashed curves), lag wave V reduction from ABRs (solid curves), and behavioral echo thresholds (vertical dashed lines) for monaural and binaural stimulation. The error bars indicate the standard error of the mean.

DISCUSSION

Effect of frequency range and implications for peripheral processing

Previous studies regarding the auditory processes underlying the PE (Divenyi 1992; Divenyi and Blauert 1987; Dizon and Colburn 2006; Shinn-Cunningham et al. 1995; Tollin and Henning 1999; Wolf et al. 2010; Xia and Shinn-Cunningham 2011) investigated the frequency dependence of localization dominance and lag-discrimination suppression. Two main hypotheses emerged: Divenyi and Blauert (1987) and Blauert and Divenyi (1988) proposed the “spectral overlap” concept, where lag-discrimination suppression was greatest (i.e., ITD thresholds were largest) for a large spectral overlap between the lead and the lag stimuli. Thus, they suggested that discrimination suppression operated within frequency bands (corresponding to peripheral auditory filters). An alternative concept of “localization strength” was proposed by Divenyi (1992) who found that localization dominance decreased with decreasing lead center frequency, i.e., a low-frequency lead suppressed the spatial information of a high-frequency lag more strongly than when they were both centered at the same high frequency. This second hypothesis assumed a discrimination suppression mechanism operating across frequency bands. Consistent with the localization strength hypothesis, Shinn-Cunningham et al. (1995) showed that low frequency stimuli dominated over high-frequency stimuli in ITD-detection tasks. Yang and Grantham (1997b) suggested that spectral overlap (i.e., processes operating within frequency bands) and localization strength (i.e., processes across frequency bands) are two independent processes governing discrimination suppression.

Other studies investigated the frequency dependence of the PE by using spectrally identical lead and lag stimuli. By varying the center frequency of the lead–lag pair, these studies investigated within frequency-band effects as a function of frequency. Localization dominance was found to be longer lasting and more pronounced for low frequency lead and lag stimuli than for high frequency stimuli (Lindemann 1986; Tollin and Henning 1999; Dizon and Colburn 2006; Wolf et al. 2010). This frequency-dependent behavior, where localization dominance was demonstrated to decrease with increasing center frequency, strongly supported the contribution of peripheral auditory processing to the PE (Tollin 1998; Hartung and Trahiotis 2001; Wolf et al. 2010; Xia and Shinn-Cunningham 2011). In fact, due to the mechanical properties of the BM, lead and lag exhibit shorter impulse responses and, therefore, shorter interactions when they are both centered at higher frequencies than at lower frequencies.

The current study tested this hypothesis experimentally, by measuring CEOAEs to spectrally identical lead and lag stimuli. The results revealed that the CEOAE lag suppression was highly frequency dependent, with longer lasting suppression at low frequencies (Fig. 3A). By experimentally supporting the previously mentioned studies, these results provide a strong link between BM impulse response duration and within-frequency channel effects reported in psychoacoustical experiments measuring the PE. Although across-frequency processes may also be present, this study shows how within-frequency band lead–lag interactions change over frequency and how this mechanism could affect the perception of a lead–lag pair.

The abovementioned studies investigated the frequency dependence of lead dominance and lag-discrimination suppression (i.e., localization tasks), whereas the current study also presented measures of fusion, which does not necessarily involve the extraction of spatial cues. Fusion and discrimination suppression might, to some extent, rely on independent mechanisms, as previously suggested (Yang and Grantham 1997a), and operate in different frequency regions. It has been shown that ITD detection most likely relies on low frequencies (Dizon and Colburn 2006; Tollin and Henning 1999), where the extraction of ITDs is most effective. In contrast, echo thresholds may be dominated by high frequencies, where the lead and lag impulse responses produce shorter interactions on the BM and can, therefore, be separated out for shorter delays than at lower frequencies. The psychoacoustical results of the current study (Fig. 2) showed slightly different ICI ranges over which fusion and lag-discrimination suppression occurred. While fusion broke down at 4.6 ms (Fig. 2A), lag-discrimination suppression was still strong for an ICI of 5 ms and present for an ICI of 8 ms (for an ITD of 150 μs, Fig. 2B, C). The shorter time range over which fusion occurred would, thus, support the hypothesis of dominance of high frequencies for echo threshold determination, where one can extract cues for the number (one or two) of perceived clicks at shorter ICIs than for lateralization.

Effects of peripheral processing on the PE

The CEOAE results (Fig. 3B) showed that peripheral suppression of the lagging click was maximal for lead–lag delays up to 4 ms, in agreement with previous studies (Kapadia and Lutman 2000; Verhulst et al. 2011a). For an ICI of 0 ms, the stimulus in the left channel was a SC with double amplitude. Here, no lag suppression occurred and the reduction of 3–4 dB with respect to the single-click condition resulted from the compressive behavior of the CEOAE level curve (Verhulst et al. 2011a). Thus, peripheral lag suppression, defined as the suppressive effect of the lead on the lag, was largest for ICIs between 1 and 4 ms.

A comparison of peripheral lag suppression and behavioral monaural echo thresholds (vertical dashed lines) is also presented in Figure 5. For all test subjects, lag suppression below the echo threshold was significantly larger than that observed above the echo threshold (Table 1). Figure 6 shows individual comparisons of peripheral lag suppression (blue and red dashed curves) and behavioral lateralization thresholds (black solid curves). This comparison revealed that large peripheral lag-suppression values were accompanied by higher lateralization thresholds (i.e., when the lagging clicks are monaurally attenuated at the level of the BM, it seems more difficult to lateralize the lag in behavioral tasks). However, while peripheral suppression seems largely responsible for elevating the lateralization thresholds for ICIs of 1–4 ms, other processes at higher stages may be responsible for raising the thresholds for ICIs of 5 (KE, SV) and 8 ms (thresholds higher than 150 μs), where OAE and ABR lag suppression was absent.

FIG. 6
figure 6

Individual comparisons of behavioral lateralization thresholds (solid black curves (in microseconds)) and peripheral lag suppression (dashed blue and red curves (in decibels); blue diamonds, monaural left; red squares, monaural right).

These results provide evidence for a monaural and peripheral component of lag suppression, occurring for lead–lag delays within the precedence window, and suggest a relation between peripheral suppression effects and the perceptual PE.

The lag suppression observed in the CEOAEs is of peripheral origin and likely related to the processing at local sites of the BM where the emission was generated. The frequency-dependent release of suppression as a function of ICI (Fig. 3A) appears to be linked especially to the duration of the local BM impulse response duration, where short ICIs lead to overlapping impulse responses that can cause lag suppression for both low and high frequency cochlear locations, whereas longer ICIs are only able to affect low frequency BM impulse responses. Although there is no invasive study that relates CEOAEs with impulse responses recorded from the BM, a large body of OAE literature provides evidence for spectral components in CEOAEs to reflect local BM processing (Kemp and Chum 1980b; Neely et al. 1988; Zweig and Shera 1995; Shera and Guinan 1999; Harte et al. 2009). Moreover, cochlear dispersion combined with coherent reflection filtering can explain why the short latencies of the CEOAE waveform contain high frequencies and the longer latencies contain low frequencies (Jedrzejczak et al. 2005; Moleti and Sisto 2008). The above studies support the view that lag suppression observed in CEOAE frequency components can be considered as reflecting complex interactions (both in phase and magnitude) of local BM impulse responses at those cochlear regions where the emission was generated. This view is further supported by two AN studies that performed recordings from single AN fibers to acoustic click pairs (Goblick and Pfeiffer 1969; Parham et al. 1996). While Parham et al. (1996) did not clarify whether the origin of lag suppression arose from adaptation in the AN itself or from cochlear processing that served as an input to the AN, Goblick and Pfeiffer (1969) referred to dynamics in local BM amplification to explain lag suppression.

Modeling studies that account for BM as well as higher level processing can provide insight in this matter (Tollin 1998; Hartung and Trahiotis 2001; Xia and Shinn-Cunningham 2011). In the model of Hartung and Trahiotis (2001), two monaural lead–lag stimuli were processed through a left- and right-ear gammatone filterbank (Patterson et al. 1995) and a hair-cell transduction stage (Meddis 1986) before the outputs were processed by a binaural cross correlation operation. Based on the monaural effects of BM filtering, (inner) hair-cell processing and subsequent binaural processing, the model was shown to qualitatively account for some of the behavioral data associated with the PE (Wallach et al. 1949; Shinn-Cunningham et al. 1995). However, whereas the role of inner-hair-cell (IHC) processing was stressed in the framework of the modeling study, the results from the present study suggest that BM processing, and not IHC/AN processing, might provide the major link between the observed CEOAE-derived lag-suppression data and the behavioral data (in agreement with the model of Tollin 1998). Adaptation effects in the AN and subsequent neural stages may further contribute to the peripheral lag suppression that was shown to affect the perception of the PE in this study. For the click stimuli used in the present study, lag suppression caused by BM impulse response interactions may dominate over AN adaptation effects, which might be stronger for longer-duration stimuli.

CEOAEs and monaural ABRs

The mean wave-V amplitude reductions (Fig. 4B, blue and red curves) obtained from ABR recordings for monaural stimulations were largest in a shorter ICI range (1–2 ms) than the peripheral lag suppression observed in the CEOAEs (Fig. 3B). Several aspects may account for this difference. First, peripheral lag suppression was measured as an amplitude reduction of the backward travelling wave, which contains information of specific reflection sites along the BM (e.g., Zweig and Shera 1995; Shera et al. 2002). In contrast, the ABR reflects neural activity elicited by the forward travelling wave and, in particular, represents the synchronous activity of neurons across the whole cochlear partition (Dau et al. 2000; Junius and Dau 2005). Even though OAE and ABR results comprise monaural lead–lag interactions, the OAE only contains a subset of frequency components present in the ABR. CEOAEs are, in fact, dominated by frequency components in the 1–2 kHz range where the middle-ear gain is largest (Puria 2003). Moreover, peripheral lag suppression in CEAOEs was observed to be frequency dependent, with longer-lasting suppression at low frequencies than at high frequencies (Verhulst et al. 2011b, Fig. 3A). Thus, the shorter time range of suppression obtained in the ABR results may be explained by the wider frequency window effective in ABRs versus CEOAEs. Second, ABRs not only reflect outer-hair-cell processing, as in the case of CEOAEs, but also represent effects of IHC processing and neural recovery times in the AN and brainstem.

Contributions of binaural processes

The mean lag-wave V reduction obtained with binaural stimulation (black curve in Fig. 4B, left panel) was not larger than the one obtained with monaural left stimulation (blue curve). The absence of binaural attenuation at the brainstem is consistent with previous results, which showed correlates of binaural lag suppression only in middle-latency responses but not in early-latency responses (Liebenthal and Pratt 1999), and with results showing correlates of binaural lag suppression in the pattern of late auditory-evoked potentials (Damaschke et al. 2005). Although the present study is in agreement with the absence of a binaural contribution to lag suppression at the brainstem level (Damaschke et al. 2005), the conclusion here differs with respect to the monaural mechanism occurring for stages below the brainstem. While previous studies (Damaschke et al. 2005; Fitzpatrick et al. 1999) concluded that monaural lag-suppression mechanisms occurring for ICIs below 5 ms originate from recovery times in neurons of the AN and brainstem, the present study presents evidence for mechanical BM lead–lag interactions as the main source of lag suppression for ICIs between 1 and 4 ms. When the cue for lateralization is carried by the lag, a mechanism of monaural suppression would account for the raise in the lateralization threshold for short ICIs. This is consistent with results from a recent study (Fisher et al. 2011) where monaural instantaneous frequency glides in BM could account for characteristic features of binaural ITD processing. For ICIs larger than 5 ms (e.g., for an ICI of 8 ms in the current paper), where no peripheral suppression occurs, central (binaural) processes are likely responsible for raising the lateralization thresholds.

Furthermore, the comparison of monaural and binaural behavioral echo thresholds (Fig. 2A) did not show any contribution of binaural processes to fusion, in agreement with previous studies (Litovsky et al. 1997; Rakerd et al. 1997), suggesting that binaural processes might not be involved in echo threshold determination.

In conclusion, the results of the present study show a correlation between mechanical cochlear processes and psychoacoustical measures of the PE for short ICIs. Although low-level effects cannot be sufficient to account for all aspects of precedence, experimental evidence was provided that monaural peripheral suppression plays a fundamental role for the binaurally perceived PE for short lead–lag delays (i.e., 1–4 ms). Not only do BM lead–lag interactions occur within the same time range as the behaviorally determined precedence window for clicks, they also represent the main component of lag suppression at the level of the auditory brainstem. The findings of the present study apply for click stimuli. For stimuli of longer duration than clicks, inhibitory processes may account for some aspects of the PE (Braasch and Blauert 2003; Lindemann 1986; Xia et al. 2010). Longer durations of suppression (above 5 ms) may be explained by central processes occurring at stages above the brainstem (Blauert 1997; Damaschke et al. 2005; Liebenthal and Pratt 1999; Sanders et al. 2008).