INTRODUCTION

Although highly successful, unilateral cochlear implantation cannot provide the binaural information that is used to localize sounds and improve signal detection in noise. Some patients have therefore been implanted bilaterally in the hope of providing the advantages of binaural hearing. Although an important advantage associated with binaural hearing results from differences in the signal-to-noise ratio at each side of the listener, also known as the head-shadow or better-ear effect (e.g., Muller et al. 2002; Litovsky et al. 2004, 2006), other advantages arise from true binaural interaction. The main binaural cues are interaural intensity differences (IIDs) and interaural time differences (ITDs). In general, bilaterally implanted patients show IID performance similar to that of normal-hearing listeners. In contrast, thresholds for ITD discrimination tend to be higher than in normal-hearing listeners (van Hoesel and Clark 1995; van Hoesel and Clark 1997; Lawson, Wilson et al. 1998; Long, Eddington et al. 2003; van Hoesel and Tyler 2003; Schoen et al. 2005). This result is not altogether unexpected given the nature of ITD stimuli provided by most cochlear implant speech processors. In normal-hearing listeners, the lowest thresholds of 10–20 μs are based on ITD cues conveyed by the fine structure of low-frequency sounds (Zwislocki and Feldman 1956; Klumpp and Eady 1957), whereas thresholds to ITDs in the low-frequency envelopes of high-frequency carriers tend to be higher, typically 50–100 μs (Henning 1974; Neutzel and Hafter 1981; Bernstein and Trahiotis 1985). With most current cochlear implant speech processors, timing information associated with the fine structure is not provided, so a loss of sensitivity to ITDs is to be expected. However, ITD information carried by the temporal envelope should be available, and, in the best cases, the sensitivity to envelope ITDs in bilaterally implanted subjects approaches that of normal-hearing listeners (Wilson et al. 2003). Nevertheless, in most cases, relatively poor ITD sensitivity is obtained, even if the signals are delivered with good control over the timing in the fine structure and envelope (van Hoesel 2004; Wilson and Lawson 2006).

A possible complication in the processing of ITD information in electric hearing is that in many cases there is a mismatch in pitch associated with stimulation of numerically matched electrodes on each side, indicating that these electrodes stimulate tonotopically different populations of neurons. Binaural neurons act as coincidence detectors where maximal firing occurs when inputs from each side arrive at the same time, as in the original Jeffress (1948) model (see Joris et al. 1998 for a review). Modern implementations of this model are based on binaural elements that receive matched frequency inputs from each side (Colburn 1973; Stern and Colburn 1978). Presentation of bilaterally mismatched frequencies can stimulate these coincidence detectors because of the spread of excitation into bilaterally matched channels. However, with mismatched frequency inputs, fewer coincidences would be expected because at least one of the inputs must be off-CF, and thus have a lower average firing rate. In bilaterally implanted listeners, mismatch in place of stimulation from nominally matched pairs of contacts can occur because of differences in electrode placement, neural survival or anatomical differences affecting spread of current in each cochlea. To adjust for these differences, pitch matching can be performed to identify contacts that are most likely to maximize the stimulation of overlapping populations. Some laboratories report an improvement in encoding ITDs when careful pitch matching and subsequent stimulation of the matched contacts is performed (Long et al. 2003; Wilson et al. 2003). Other studies report little or no differences in threshold to ITDs across wide ranges of electrode contacts, including those identified through careful pitch matching (van Hoesel and Clark 1997). If accurate pitch matching is necessary to maximize performance, this would suggest that binaural timing information can be obtained only if information from the same frequency channels can be identified and appropriately combined. Alternatively, if ITD information can be combined across unmatched channels, then pitch matching might not be necessary for good binaural processing.

Physiological information about the ability of the binaural system to combine envelope ITD information across carrier frequencies is currently limited to a small number of neurons from the inferior colliculus (IC) in barbiturate-anesthetized cats, examined as part of a larger study (Yin et al. 1984). In that report, neurons in the inferior colliculus showed sensitivity to ITDs for frequency mismatches greater than 1 octave. The conclusion was that neurons could encode envelope ITDs as long the carrier frequencies presented at each side fell in the response area of the neuron. This leads to the prediction that tolerance to carrier frequency mismatch should be greater at higher stimulus levels where the response area tends to be wider. However, by the level of the inferior colliculus, the convergence of excitation and inhibition from stimulation of each side is complex and it is not clear if such a simple mapping should hold. If, for example, a neuron shows nonmonotonic responses to envelope ITDs with intensity, then the carrier frequency/intensity combinations that show ITD sensitivity would not be predictable from the responses to single tones. The effect of intensity may be of considerable importance for binaural processing with electrical stimulation because of the limited dynamic range (Rubinstein et al. 1999). To better understand the effect of interaural frequency mismatches and the role of intensity on the encoding of ITDs in envelopes, we recorded from single neurons in the inferior colliculus of unanesthetized rabbits as these parameters were varied.

Psychophysical experiments were performed in humans for comparison with the physiological results. Previous psychophysical experiments with human listeners have measured envelope ITD discrimination thresholds for amplitude-modulated high-frequency tones using either matched or mismatched carrier frequencies; those studies noted a sharp increase in threshold with relatively small mismatches (Henning 1974; Neutzel and Hafter 1981). This result may appear to be at odds with the physiological data, which were interpreted as indicating relative insensitivity to carrier mismatches across ears (Yin et al. 1984). However, the previous physiological study compared neural responses to stimuli that spanned the full range of ITDs associated with the envelope frequency of sinusoidally amplitude-modulated (SAM) tones, and determined the range of carrier frequency mismatch that produced detectable levels of response modulation by the ITD. A similar paradigm was used in the current study. In contrast, psychophysical studies have typically determined the minimum change in the ITD that can be detected behaviorally. This procedural difference makes comparisons between the physiological and behavioral studies difficult. Consequently, the purpose of our psychophysical experiments was to collect data under comparable conditions to those used to characterize physiological sensitivity to envelope ITD. The goal of both sets of experiments was to identify the maximal range of mismatch from which usable ITD information could be obtained.

MATERIALS AND METHODS

Neural recordings

Two adult female Dutch-belted rabbits were used for the neural recordings. The animals were handled according to the standards and protocols of the Institutional Animal Care and Use Committee at the University of North Carolina at Chapel Hill, and the “Guide for the Care and Use of Laboratory Animals” published by the National Institute of Health.

Surgical methods and recording procedures

Surgical and recording methods for the unanesthetized rabbit preparation have been previously described (Kuwada, Stanford et al. 1987; Fitzpatrick and Kuwada 2001). All surgical procedures were performed using aseptic techniques on rabbits with clear external ears. Under general anesthesia (ketamine, 35 mg/kg and xylazine, 5 mg/kg) a square brass rod was anchored to the skull using screws and dental acrylic. While anesthetized, earmolds were made by inserting a rod in the ear canal and packing impression compound (Hal-Hen, Long Island City, NY) around the rod. The rod was subsequently replaced with a hollow, 6.5 cm long sound tube that was cemented in the earmold. The sound tube also contained a probe tube used for calibration that extended 1 mm from the end of the sound tube to within about 2 cm of the tympanic membrane. After acclimation to the recording environment and listening to sounds through the earmolds, the animal was reanesthetized and a small rectangular craniotomy (approximately 3 × 4 mm) was made in the skull overlying the inferior colliculus. The exposed dura was covered with a sterilized medical elastopolymer (Dentsply International, Milford, Delaware) after this procedure and after each daily recording session.

All recordings were conducted in a double-walled, sound-attenuated chamber. The unanesthetized rabbit was placed in a cloth wrap and seated in a cradle from which its head protruded. The wrap and padding provided only mild restraint, their primary purpose being to discourage movements that might cause injury to the rabbit. The rabbit’s head was fixed by clamping to the surgically implanted brass rod. Once the rabbit was secured, the elastopolymer covering was removed to expose the opening in the skull. To reduce possible pain or discomfort during the penetration of the electrode, a topical anesthetic (1% lidocaine) was applied to the dura for approximately 5 min and then removed by aspiration. With these procedures, rabbits remained still for the recording session (2–3 h), an important criterion for neural recording. A session was terminated if the rabbit showed any signs of discomfort. Daily recording sessions were made for a period of 4–6 months per rabbit.

Extracellular recordings were made with glass-coated, platinum–tungsten microelectrodes (tip diameter of 1–2 μm, impedances of 10–20 MΩ). Electrode position was controlled via a piezoelectric microdrive (Burleigh Instruments, Fishers, NY) and calibrated in reference to bregma before each recording session, allowing consistent spatial orientation over the period of recording. The action potentials of single neurons or small clusters of neurons were amplified 5,000–25,000 times, isolated with the aid of a window discriminator and timed relative to the stimulus onset with an accuracy of 1 μs. All of the neurons illustrated as examples are single neurons where no other spike waveforms crossed the threshold of the window discriminator. If another waveform was even occasionally included in the response, it was classified as a small cluster. We did not include any obvious multiunit recording where there was no single dominant waveform.

Acoustic stimuli and data collection

Auditory stimuli were generated in MATLAB and presented through a two-channel digital stimulation system consisting of an array processor, digital to analog converters, antialiasing filters and programmable attenuators (Tucker-Davis System II, Alachua, Florida). Acoustic calibrations for amplitude and phase were performed at the outset of each recording session with a measuring amplifier connected to a 1/4″ microphone (B&K Instruments, Naerum Denmark) attached to the distal end of the probe tube. Calibrating sounds were bands of noise spanning the range from 60 to 50,000 Hz. Sound levels throughout are expressed in dB re 20 μPa (dB SPL).

Data collected consisted of estimates of frequency tuning, the ability of tones to evoke excitation or inhibition from the contralateral and ipsilateral ears, tuning to ITDs in envelopes using tones and noise, and sensitivity to ITDs in envelopes using mismatched frequencies at the ears. The characteristic frequency (CF) was determined using an automatic tuning curve algorithm. Step size was 0.25 octaves over the range of 250–32,000 Hz. At each frequency step, the algorithm tracked the threshold of short tone bursts adaptively based on a user-supplied spike criterion (i.e., if the number of spikes within the stimulus period exceeded that expected from the spontaneous rate by a specified number of spikes it was scored a hit, and if it did not exceed the criterion it was scored a miss). The tone bursts were 75 ms duration presented every 300 ms with 4-ms linear on/off ramps. The final step size of the adaptive tracking was 1.5 dB, and the spike criteria ranged from 1 to 3 spikes. The best frequency (BF), or frequency of maximal firing, was determined by presenting tone bursts in isointensity frequency sweeps. The intensities used were typically 50 and 30 dB SPL. Recording sites with CFs or BFs <2 kHz were not studied further to avoid including neurons whose primary ITD sensitivity was to the fine structure rather than the envelope.

The response to stimulation of each ear was tested with CF tones presented across a range of intensities to each ear alone and in combination. The response was classed as excitatory if the average rate increased during the presentation of a 75 ms tone. An inhibitory response could be demonstrated either by a decrease in spontaneous firing, if sufficient spontaneous rate was present, or by a decrease of the excitatory response generated by an above-threshold tone presented to the excitatory ear.

Sensitivity to envelope ITDs was tested using binaural beat stimuli and noise. The binaural beat stimuli were created using SAM tones with a 1-Hz difference between the modulation frequencies at each ear (as used by Yin et al. 1984). The carrier frequency presented to both ears was either the CF, if determined by a tuning curve, or the lowest-intensity BF, if determined by isointensity stimuli. Stimulus duration was 5.1 s, which results in five full cycles of a dynamically varying interaural phase difference (IPD) of the envelope that swept linearly every 1 s. Frequencies of SAM ranged from 25 to 800 Hz in 0.5 octave steps. The initial intensity was usually 50 or 70 dB SPL. From the response to this stimulus, a best modulation frequency was determined by inspection of the PST histograms. If the neuron responded equally well at multiple modulation frequencies, one of the lowest frequencies (typically <200 Hz) was chosen. Low modulation frequencies were preferentially chosen because sensitivity to fine structure ITDs is almost entirely absent below 200 Hz (Fitzpatrick et al. 2002); as such, a binaural response obtained at a rate of 200 Hz or lower is almost certainly because of envelope rather than fine structure cues. The best intensity for sensitivity to envelope ITDs was then determined for the best modulation frequency in 10 dB steps. All subsequent measurements of ITD tuning to envelopes of SAM tones were made at the best modulation frequency and best intensity, although different values of these parameters were sometimes tested as time permitted. Thus, the experimental design was to optimize the parameters for ITD sensitivity to the envelope before taking the measurements of interest for this study.

To further determine that the sensitivity to ITDs was caused by the envelope and not the by fine structure, delay curves to broadband noise (100 Hz to at least several kHz higher than the CF of the neuron) were taken with the phase of the noise fully correlated and again with the phase of the noise in one ear inverted or anticorrelated. If the noise delay functions in these two conditions do not invert, i.e., peaks become troughs and vice versa, then the response can be attributed to ITDs in the envelope (Joris 2003). If the curves are dissimilar but without complete inversion, then a mix of sensitivity to ITD in the fine structure and envelope can be inferred (Joris 2003). Any neuron that demonstrated response to the fine structure was not included in this study.

The effect of introducing the carrier frequency mismatches was studied using the binaural beat stimulus. The carrier frequency at the ipsilateral ear was held at the CF whereas the frequency at the contralateral ear was shifted in 0.25 octave steps. The measurements were then repeated holding the contralateral ear at CF and varying the carrier frequency in the ipsilateral ear. As time permitted, frequency mismatches were studied at additional intensities in 10 dB steps.

Data analysis

The main measures of sensitivity to the interaural phase were the synchronization index, or vector strength, and the mean interaural phase of the response (Goldberg and Brown 1969). These measures are available from period histograms binned on a 1-Hz cycle of the binaural beat stimulus. The vector strength is a measure of how concentrated the firing is to a particular phase. It varies from zero to one with one indicating all responses within a single bin of the period histogram and zero indicating random firing with respect to the stimulus phase. The criterion for significant synchronization was p < 0.001 using the Rayleigh test of uniformity (Mardia and Jupp 1999). The mean phase is the vector average of all of the phases at which responses were obtained.

The discharge rate at each frequency was also determined, as was the sync-rate. The sync-rate is the product of the discharge rate and the synchronization index. The sync-rate is an estimation of the Fourier transform of the noise delay curve (Yin et al. 1986) in that it represents the proportion of the ITD-sensitive part of the response that can be attributed to each frequency.

Psychophysical methods

Psychophysical data were collected on three normal-hearing adults, ages 28 to 48 years. All had pure-tone thresholds of 20 dB HL or better at octave frequencies 250–8,000 Hz (ANSI 1996), and none reported a history of ear disease. These listeners were highly practiced at high-frequency ITD tasks, having listened to 30+ h in a previous experiment on high-frequency envelope ITD discrimination. Two of the listeners were authors and the third listener a volunteer with years of psychoacoustical listening experience. Pilot data obtained with less experienced listeners suggests that a wide range of performance in this task would be obtained with a random sample of the population, but it was also our experience that performance improved dramatically over time with practice. As such, the data presented here likely represent asymptotic or near asymptotic performance rather than average performance.

Psychoacoustical testing included two paradigms, one in which the listeners were asked to discriminate binaural in-phase vs. out-of-phase envelope modulation (the ‘static’ task) and one in which they were asked to identify the direction of a binaural beat sweep (the ‘dynamic’ task). In both tasks, the stimulus was generated by multiplying a high-frequency carrier tone by a raised sinusoidal modulator tone. The carrier tone had an initial level of 70 dB SPL; once modulated, the total stimulus level was 71.76 dB SPL. The degree of frequency mismatch across ears was defined in 1/8 octave steps, and the geometric mean of carrier frequencies presented to the left and right ears was always 4,000 Hz.

In the ‘static’ task, stimuli were 0.5 s in duration, including 40-ms cos2 onset and offset ramps. The modulator tone was 128 Hz with a random starting phase chosen before each stimulus presentation. The envelope ITD was manipulated via the relative starting phase of the tone presented to each ear. The task was a four-interval, two-cue, two-alternative forced-choice. The envelope was in-phase for both cue intervals and one of the listening intervals. In one of the listening intervals, chosen at random, the envelope was binaurally out-of-phase. Observers were asked to identify the interval in which the stimulus was out-of-phase. Visual feedback was provided after each observer response. Percent correct was estimated for blocks of 25 trials each.

In the ‘dynamic’ task, stimuli were 1 s in duration, including 40-ms cos2 onset and offset ramps. The modulator tone was either 128 or 129 Hz, and the ear receiving the higher frequency was randomly assigned on each interval. Modulation was in-phase at the beginning and at the end of each listening interval, and the phase transition defining the beat occurred over the course of the listening interval. The task was a one-interval, two-alternative forced-choice. Observers were asked to identify the starting direction of the lateralization sweep. The left button on the response box was pressed to indicate that the sweep began on the left and progressed to the right, and the right button was pressed to indicate the converse. Visual feedback was provided after each observer response. Percent correct was estimated for blocks of 50 trials each.

All testing took place in a double-walled booth. Stimuli were generated digitally in software, and played out of a DAC at 24.4 kHz (Tucker-Davis System III, Alachua, Florida). Sounds were delivered via ER2 insert phones (Etymotic, Elk Grove Village, IL).

RESULTS

Physiological experiments

The neural results were obtained from 72 neurons in the IC of 2 adult rabbits. Fifty-nine neurons were from single unit recordings and 13 were from small cluster recordings. The range of CFs was from 2.0 to 26.1 kHz.

Example of the data set collected

A data set collected for a representative neuron is shown in Figure 1. The tuning curve (Fig. 1A) shows that the CF was 3,363 Hz. The open circles represent the width of the tuning curve at 50 dB, which, as we will show below, was the best intensity for sensitivity to envelope ITDs. Figure 1B is a comparison of the rate-level functions using CF tones in three different configurations. For stimulation of the contralateral ear alone (X symbols), the response was monotonic. For stimulation of the ipsilateral ear alone (open symbols), there was no excitatory response and too little spontaneous activity to indicate an inhibitory response. When the intensity at the contralateral ear was held at 20 dB and the intensity at the ipsilateral ear varied (solid circles), the spike rate declined for high intensity stimulation, demonstrating that the response to ipsilateral stimulation was inhibitory.

FIG. 1
figure 1

Complete data set for one neuron. A. Tuning curve of neuron with a CF of 3,363 Hz. Open circles represent 50 dB, which was the neuron’s best intensity. B. Rate level functions demonstrating a monotonic contralateral response (X symbols), limited ipsilateral response (open circles), and the inhibitory effect of ipsilateral stimulation when the contralateral ear was held constant at 20 dB (solid circles). C. ITD response to broad band noise, including neuronal response to fully correlated noise (solid circles) and anticorrelated noise (open circles). D. Response to binaural beat with modulation frequencies from 800 to 25 Hz. The neuron was tested at 100 Hz. E. Intensity response to the binaural beat, 50 dB was used for testing. F–G. Responses to mismatched frequencies in the binaural beat stimulus with the other ear held constant at 4,000 Hz. Frequencies where there was significant synchrony to the binaural beat stimulus (Rayleigh test of uniformity, p < 0.001) is represented by the solid circles; open circles did not meet the synchrony criterion. The solid line near the bottom is the range of frequencies with above threshold response to tones. The vertical dashed line is the frequency that was held constant.

Broadband noise stimuli were used to determine that the neuron’s responses to stimulus ITD were driven by the envelope and not the fine structure delay. At an ITD of zero, the noises were either fully correlated (+1, solid circles, Fig. 1C) or anticorrelated (−1, open symbols). The responses to both correlated and anticorrelated noises were similar, indicating that the neuronal response can be attributed to the temporal properties of the stimulus envelope and was not affected by the polarity of the fine structure (Joris 2003).

To determine the best modulation frequency for sensitivity to envelope ITDs, binaural beat stimuli were presented in octave steps with mean modulation frequencies of 25 to 800 Hz and matching carrier frequencies in both ears (Fig. 1D). In the PST histograms, five peaks represent sensitivity to the phase of the envelope during the five cycles of the binaural beat stimulus. Such sensitivity is apparent across the range tested. The modulation frequency chosen for further testing for this neuron was 100 Hz. Figure 1E demonstrates the neuron’s response to changes in intensity of the binaural beat stimulus. The response to the phase of the envelope was nonmonotonic as a function of intensity with the greatest sensitivity at 50 dB.

The responses to mismatched carrier frequencies in the binaural beat stimulus are presented in Figure 1F and G. Each point represents the neuron’s response while varying frequencies in one ear with the frequency at the other ear held constant near the CF, in this case at 4,000 Hz. The solid circles indicate responses with significant synchrony to the 1-Hz binaural beat frequency (p < 0.001, Rayleigh test of uniformity, Mardia and Jupp 1999), whereas responses plotted with open circles did not meet the criteria for significance. The solid line near the bottom of each plot is the range of frequencies with above-threshold responses as a function of contralateral tone frequency (open circles in Fig. 1A). In Figure 1F, the frequency of the stimulus presented to the ipsilateral ear is indicated by a vertical dashed line. When varying the frequency at the contralateral ear, significant synchronization occurred for all frequencies tested, which covered a range of 3.7 octaves (707–6,727 Hz, Fig. 1F, left panel). Consistent with the shape of the tuning curve, significant synchrony extended to larger mismatches on the low-frequency side than on the high-frequency side. On the low-frequency side, the range for significant synchrony exceeded that expected from the tuning curve. The rate at these frequencies (Fig. 1F, middle panel) was low. Lower thresholds to synchronization than to rate are known to occur in auditory nerve fibers (Rose et al. 1967). The sync-rate curve (Fig. 1F, right panel) was similar in shape to the rate curve, as the rate response was more highly modulated than the synchronization.

In Figure 1G, the carrier frequency to the contralateral ear was held constant at the CF (dotted lines) whereas that to the ipsilateral ear was varied. Again, significant synchronization occurred for a wide range of mismatched frequencies (2.2 octaves, Fig. 1G, left panel). However, in contrast to when the carrier frequency at the contralateral ear was varied, the spike rate increased as the ipsilateral stimuli moved away from the CF (Fig. 1G, middle panel), presumably because the ipsilateral input was inhibitory (Fig. 1B) and maximal inhibition was evoked near the CF. The sync-rate curve (Fig. 1G, right panel) resembled the synchronization curve, which was more highly modulated than the rate curve.

Comparison of frequency range of responses to single tones and to mismatched carrier frequencies

The CF of most neurons was between 2 and 8 kHz (Fig. 2A, horizontal histogram). The best modulation frequencies ranged from 25 to 566 Hz (Fig. 2A, vertical histogram). No relationship was seen between the CF of a neuron and the best modulation frequency chosen for further testing (Fig. 2A). This result indicates that a wide range of modulation frequencies is represented across neurons with each CF (Langner and Schreiner 1988).

FIG. 2
figure 2

Comparison of the frequency response to tones and synchronization to the binaural beat stimulus. A. The distributions of the CF to contralateral tones and best modulation frequency to the binaural beat stimulus. The CF for the majority of neurons tested was between 2,000 and 4,000 Hz. The modulation frequency responses were distributed from 20 to 566 Hz. No relationship was found between the neuron’s CF and the best modulation frequency. B. The comparison of the range of frequency response at the contralateral ear to the range of significant synchronization to a changing carrier frequency at the contralateral ear. Most neurons responded to a smaller range of frequencies for synchronization (i.e., most points lie below the diagonal, indicated by the solid line). The correlation between these measures was low (dotted line, r = 0.35).

It might be expected that a principal determinant of the range of frequency mismatches capable of producing sensitivity to interaural phase of the modulator would be defined by the frequency selectivity of the neuron. However, as shown in Figure 1, the range of mismatched frequencies for significant synchronization could exceed that expected from the tuning curve. More commonly, when varying the contralateral ear frequency, the total range of mismatched frequencies capable of producing sensitivity to envelope ITDs was less than the range of excitatory responses to contralateral tones (Fig. 2B). The frequency tuning width (abscissa) was determined based on isointensity frequency sweeps in the contralateral ear at the intensity used to measure responses to mismatched carriers or from the width of tuning curves at that intensity (as in Fig. 1A). For the modulated tones (ordinate), the carrier frequency was varied in the contralateral ear with the ipsilateral frequency at CF. Points that lay below the line of equality indicate a smaller frequency range for envelope ITD sensitivity than for tuning curves. The overall correlation between the two measures was low (r = 0.35). These results suggest that factors other than the frequency tuning of the neuron influence the ability to encode envelope ITDs for binaurally mismatched carrier frequencies.

Effects of stimulus intensity

For each neuron, the effect of intensity on synchronization and rate was assessed with matched carrier frequencies and at the best modulation frequency. The degree of nonmonotonicity of rate was computed as the ratio between the response at the highest intensity and the maximal response at any intensity (Sutter et al. 1995), and a similar metric was used for synchronization. A neuron’s response was classified as nonmonotonic if the rate or synchronization ratio was less than 0.7. Most neurons (59/72 or 82%) showed nonmonotonic synchronization functions with intensity (Fig. 3A, left). More than 1/2 showed ratios less than 0.1, indicating little or no significant synchronization at the highest intensity. In contrast, most neurons (51/72 or 71%) showed monotonic rate functions (Fig. 3A, right). The most common response across neurons was a nonmonotonic level function for synchronization and a monotonic function for rate (Fig. 3B, 44/72 neurons). Thus, in most neurons, the greatest precision of temporal encoding of ITDs in the envelopes occurred when the neuron was firing less than maximally. The next most common combination was nonmonotonic functions for both synchronization and rate (Fig. 3C, 15/72). Relatively few neurons had monotonic responses for synchronization and nonmonotonic response to rate (Fig. 3D, 9/72) and even fewer showed monotonic responses for both synchronization and rate (Fig. 3E, 4/72).

FIG. 3
figure 3

Level functions to synchronization and rate. A. Distributions of the monotonicity ratio (see text) to synchronization and rate. B. Nonmonotonic response to synchronization and a monotonic response to rate for three neurons: 1) CF of 2,378 Hz and AM was 25 Hz (squares), 2) CF of 2,000 Hz and AM was 283 Hz (solid circles), 3) CF of 2,828 Hz and AM was 70 Hz (open circles). C. Nonmonotonic responses to synchronization and rate for two neurons: 1) CF of 2,000 Hz and AM was 400 Hz (solid circles), 2) CF of 4,000 Hz and AM was 100 Hz (open circles). D. Monotonic response to synchronization and nonmonotonic response to rate. CF of 5,656 Hz and AM was 566 Hz. E. Monotonic responses to synchronization and rate. CF of 6,727 Hz and AM was 566 Hz.

In a subset of neurons (23/72), we took complete response areas to mismatched frequencies across intensities. Figure 4 is an example of strongly nonmonotonic synchronization to the beat frequency across the response area as the carrier in the contralateral ear was varied. Strong synchronization was restricted to the range of 20–50 dB and mismatches within 0.25 octaves of the CF (5,656 Hz), although significant synchrony was obtained for a larger carrier frequency mismatches and a wider range of intensities. The spontaneous rate was near zero, so many frequency/intensity combinations had considerable response rates with little synchronization.

FIG. 4
figure 4

The response to the binaural beat stimulus across a range of carrier frequencies and intensities in an example neuron. The CF was 5,656 Hz. The ipsilateral carrier frequency was held at the CF (bold on frequencies indicated at top) whereas the contralateral carrier frequency was varied. The modulation frequency was 200 Hz. Strong synchronization was limited to 20–50 dB.

Comparison of carrier frequencies varied at either the contralateral or ipsilateral ear

Similar ranges of frequency mismatches could generally be tolerated when the carrier frequency was varied at the contralateral ear or when the carrier frequency was varied at the ipsilateral ear (Fig. 5). In both cases, frequency ranges for significant synchronization (Rayleigh test of uniformity, p < 0.001) that exceeded 2 octaves were common and some neurons had ranges that exceeded 3 or even 4 octaves. These distributions represent all of the functions obtained when varying the frequencies at each ear, across all intensities. The median value for the contralateral ear was 1.82 octaves compared to 1.25 octaves for the ipsilateral ear. This difference was not significant (Mann–Whitney rank sum test, p > 0.05). When matched samples where the frequencies were varied at each ear in the same neurons were compared (n = 32), varying the contralateral frequency again yielded a slightly larger median value (2.0 vs. 1.5 octaves) but the difference was not significant.

FIG. 5
figure 5

Distribution of the range of mismatched frequencies that produced significant synchronization to the binaural beat stimulus. A. Contralateral ear varied. B. Ipsilateral ear varied.

To obtain an indication of the population response, we plotted the average synchronization, rate, and sync-rate as a function of the distance from CF in octaves across neurons (Fig. 6). This analysis essentially normalizes for CF and differences in the best intensity. Each point includes all neurons tested at these values, and thus includes both significant and nonsignificant synchronization. At least 10 neurons are included for each data point. When the carrier frequency at the contralateral ear was varied, the synchronization of the population (Fig. 6A) was relatively flat across carrier frequencies. The average rate (Fig. 6B) was highest when the carrier frequencies were matched across ears, declined steeply for separations up to 1 octave, and declined more slowly for greater separations. The sync-rate (Fig. 6C) was similar to the rate alone, and showed the greatest response with matched carriers and a two-stage decline as the carrier frequencies were separated. When the carrier frequency at the ipsilateral ear was varied, matched frequencies showed the highest average synchronization (Fig. 6D) but the function was relatively flat. The rate increased for separations greater than 1 octave below CF, presumably because of a reduction of inhibition. Again, the sync-rate showed the greatest response when the carrier frequencies were matched and decreased when mismatched carrier frequencies were presented to the ears.

FIG. 6
figure 6

Average synchronization, rate, and synchronization rate as a function of distance from CF. Error bars indicate the standard error of the mean. A/D. Synchronization as a function of carrier separation was relatively flat for changes introduced in either ear. B/E. Rate responses were highest when the carrier frequencies for both ears were matched for contralateral chances, but this function was relatively flat for ipsilateral chances. C/F. The synchronization rate was highest at zero and declined quickly with carrier frequency separation for both contralateral and ipsilateral chances.

The relatively flat synchronization curves for mismatched carriers at either ear mean that the precision of the temporal information encoded does not change with mismatch. Instead, mismatched carriers had a greater effect on the rate and the sync-rate.

Response phases

In some neurons, the interaural phase of the response changed systematically with mismatches in the carrier frequencies. Typically, the phase changes were in opposite directions when the carrier frequency was varied in the left and right ears. However, the phase changes were negligible in some neurons and the direction of the shift for the changes in the carrier frequencies at the left and right ears varied across neurons, so that the average phase shift in the population was small.

Behavioral experiments

Data from the psychophysical tasks are shown in Figure 7 with those from the static task in Figure 7A and those from the dynamic task in Figure 7B. Estimates of percent correct are plotted with symbols as a function of carrier frequency separation, with each of the three observers’ data plotted separately. Data were collected for the lower-frequency carrier presented to the left or to the right ear; these data were not significantly different and so were averaged. Each point represents the average of 100–300 stimulus presentations. Because data were collected using a two-alternative forced-choice procedure, chance performance is at 50%. The solid lines show fits to the data with a Logit function, defined as:

$$p{\left( x \right)} = \frac{1}{n} + \frac{{{\left( {{1 - 1} \mathord{\left/ {\vphantom {{1 - 1} n}} \right. \kern-\nulldelimiterspace} n} \right)}}}{{1 + e^{{{ - {\left( {x - \mu } \right)}} \mathord{\left/ {\vphantom {{ - {\left( {x - \mu } \right)}} k}} \right. \kern-\nulldelimiterspace} k}} }},$$
(1)

where μ is the mean of the underlying cue distribution, k is the slope parameter, x is the signal magnitude, and n is the number of intervals in an n-alternative task.

FIG. 7
figure 7

Results of psychophysical tests, plotted as the mean percent correct (circles) and best fitting Logit function (lines). A. Results for the ‘static’ stimulus. B. Results for the ‘dynamic’ stimulus.

For the static task, the ability to identify the out-of-phase stimulus envelope fell off with increasing carrier frequency separation, but there were individual differences in performance. Observer 2 performed at chance by a carrier separation of 2.5 octaves, whereas observer 1 performed above chance (approximately 60% correct) at a carrier separation of 3 octaves. In the dynamic task, the ability to discriminate beat direction also fell off as a function of carrier frequency with deviations from 100% evident at very smaller carrier separations. Performance reached chance (50%) for separations near 2 octaves for all listeners.

An additional data point was collected in both the static and dynamic tasks for a 1-octave carrier separation, but with both the low- and high-frequency carriers presented to the same (left) ear. Performance was at chance (50%) with this monaural stimulus for the dynamic task, but performance of the static task was quite good monaurally (approximately 100%). This result suggests that the static task could be performed based on across-channel cues that are not inherently binaural in nature, such as across-channel envelope coherence (Strickland et al. 1989). This finding raises the possibility that performance of the static binaural task might not reflect ‘true’ binaural processing, relying instead on some other across-channel cues. In contrast, the dynamic task could not be performed based on monaural cues, and so results in this condition provide a more conservative estimate of the binaural processing limits.

DISCUSSION

We have demonstrated that usable ITD information can be obtained both neurally in rabbits and behaviorally in humans with a range of carrier frequency mismatch between the ears of as much as 2–3 octaves. This similarity suggests that the physiological data obtained in rabbits represent the same processing limits at work in the human data. A large range of tolerated mismatches is in accord with the only previous physiological investigation of this issue (Yin et al. 1984). It is also in accord with a psychophysical study, which showed that large mismatches could be tolerated when detecting the presence of a binaural beat in the modulation frequency (McFadden and Pasanen 1975). However, in that study, the full extent of mismatches capable of producing detectable binaural beats was not systematically tested. The most extensive previous psychophysical studies (Henning 1974; Neutzel and Hafter 1981) measured the threshold ITD to a given carrier frequency mismatch. These studies showed that thresholds increased greatly even with small carrier mismatches, a result that may be reflected in our physiological finding that the maximal average sync-rate was obtained with matched carriers. Most of the decline in sync-rate occurred for mismatches of about 1/4 octave, a value that results in a large increase in ITD threshold behaviorally. Thus, the main results from this study are in general agreement with previous studies.

For bilateral cochlear implant users, the implications are that some ITD information should be available even without perfect place/pitch matching. Certainly we would expect accurate place/pitch matching to yield better results, as has been shown in some studies (Long et al. 2003; Wilson et al. 2003). Other studies, however, have shown that ITD information can be obtained even over a range of as many as eight contacts, spanning many millimeters in the cochlea (van Hoesel and Clark 1997). The dissimilarity in outcomes between the studies may be because of individual differences in factors such as spread of excitation, channel interactions, and neural survival.

The use of stimuli optimized for maximum effect on the neuronal responses.

Stimuli used in the physiological experiments were optimized for each recording by determining the best modulation frequency and intensity before determining the effect of carrier frequency mismatches. The assumption in optimizing the stimulus is that each neuron is representative of the set of neurons that would provide the bulk of the information if a standard stimulus were used. That is, by optimizing the responses, we are “normalizing” for CF, modulation rate, intensity, etc. so that every neuron can be considered as part of the population that would respond nearly maximally to any chosen standard stimulus. An alternative design is to use the same stimulus across neurons to obtain a true population response. However, differences in tuning to modulation rate and intensity mean that far fewer neurons will respond well to the chosen stimulus. Data collection is already challenging, as only neurons with high characteristic frequencies were included, and of these, a minority show sensitivity to ITDs in the envelope. Yin and Kuwada (1984) reported that 24 of 58 high-frequency neurons tested showed sensitivity to ITDs in the envelope. Although we did not collect such information systematically, based on our experience, the proportion in the unanesthetized rabbit is also less than 1/2 of all high-frequency neurons. The effect of the optimization is that the physiological responses shown here may overestimate the actual population response to any given standard stimulus. However, we did include neurons that, even after optimization, responded with low syncs and/or rates to changes in the interaural phase of the envelope. Thus, it is likely that the samples of responses to our optimized stimuli are representative of the responses from neurons showing relatively strong activation to a standard stimulus. We took care to include neurons that responded only to ITDs in the envelope rather than to ITDs in the fine structure. This included limiting the sample to neurons with high CF, characterizing the responses with noise, and using low modulation frequencies when possible.

Neural sensitivity to envelope ITDs using mismatched carrier frequencies

Three main measures were used to assess sensitivity to ITDs using binaural beats with SAM modulated tones. These were the synchronization index, the response rate, and the sync-rate (i.e., product of the synchrony and rate). The synchronization index is directly related to the transmission of temporal information. The rate is a measure of the overall responsiveness to the stimulus. The sync-rate incorporates both features and estimates the proportion of the total ITD sensitivity that can be attributed to each frequency (Yin et al. 1986). Across neurons, the synchronization index tended to be relatively flat for wide ranges of mismatched frequencies. This result suggests that as long as binaural interactions occur with a particular set of carrier frequencies, the main feature affecting the temporal precision is the modulation waveform. When the carrier frequency at the contralateral ear was varied, the rate declined as a function of the distance from the CF. The opposite often occurred when the carrier at the ipsilateral ear was varied, as the inhibitory drive was reduced. The sync-rates reflected these trends. Thus, to a first approximation, the precision of the timing information is controlled by sensitivity to the modulating waveform, whereas the response rate is controlled by sensitivity to the carriers. This was also the conclusion reached by Yin et al. (1984). However, in our study, there was only a weak association between the frequency ranges for responses to single tones and the range of frequencies associated with significant synchronization to binaural envelope ITDs. An important additional factor is intensity, as described in the next section.

Effects of sound intensity

Because of spread of excitation, a prediction was that an increased level would increase the tolerance to mismatched carrier frequencies. However, whereas the change in firing rate with level was typically monotonic, the synchronization to the beat frequency was typically nonmonotonic. This result meant that level did not necessarily increase the range of mismatches tolerated or the degree of synchronization of the IPD-sensitive responses.

The loss of binaural synchronization to the envelope with increased intensity parallels the loss of monaural synchronization seen in the auditory nerve (Smith and Brachman 1980; Joris and Yin 1992; Dreyer and Delgutte 2006). Thus, intervening processing, such as an increase in synchronization in the cochlear nucleus compared to the auditory nerve (Joris et al. 1994a; b), does not appear to eliminate this peripheral limitation. The loss of synchronization with level in the periphery corresponds to saturation of the rate-level functions, such that, for a high average intensity, a change in amplitude is poorly represented in the firing rate (Dreyer and Delgutte 2006). It is interesting to note that the peripheral loss of synchronization at high levels does not result in a behavioral loss of sensitivity to AM with increasing level in normal-hearing listeners (Viemeister 1979; Kohlrausch 1993). It is possible that off-CF neurons where the rate-level functions are not saturated can mediate the detection of the AM stimulus at high stimulus levels. For the binaural condition, this suggestion could be tested physiologically by moving the carriers at both ears off-CF and measuring responses as a function of level, a condition we have not yet tested.

Saturation of firing rate could also be a limiting factor for synchronization to the envelope in electrical hearing, as a result of the limited dynamic range of neural firing (van den Honert and Stypulkowski 1984). In addition, the nonlinear signal processing performed by the speech processor may inadvertently introduce saturation-like effects with detrimental consequences on envelope ITD encoding. Because the dynamic range of electric hearing is compressed relative to the range of environmentally relevant acoustic signals, the speech processor compresses the incoming signal, reducing the fidelity of intensity coding. Any loss of monaural synchronization would be even more important for binaural processing because the process of coincidence detection produces binaural synchronization that is approximately the square of the monaural synchronies (Batra et al. 1997; Batra and Yin 2004). Implant listeners tend to have acute sensitivity to the presence of amplitude modulation (Shannon 1992), but there is some evidence that the ability to detect small changes in stimulus rate is relatively poor (Zeng 2002). Furthermore, in at least one case, better ITD discrimination was found for an electrode pair that was mismatched in place/pitch than for the matched pair (Long et al. 2003). It is tempting to suggest that in this case, the matched pair was in the saturated range and the better result with the mismatched pair was caused by moving into a nonsaturated state, similar to off-CF listening. The possibility of off-CF listening with implants could be restricted because of poor channel separation or decreased neural survival.

A further possible complication of level in bilateral cochlear implants is hypersynchronization to amplitude modulation for high carrier pulse rates (Litvak et al. 2001; 2003). Processing of ITDs is typically done by neurons that are broadly-tuned to ITDs using a population code, rather than by narrowly tuned elements (Fitzpatrick et al. 1997; McAlpine et al. 2001; Bala et al. 2003). Decreasing the range of ITDs over which neurons can show a graded response could disrupt the operation of this population code. In addition, when the inputs fire with a high degree of synchrony and on a high proportion of cycles, the convergence of endings onto binaural neurons in the superior olivary complex (Kiss and Majorossy 1983; Majorossy and Kiss 1990) would suggest an increased probability of coincidences because of inputs from the same side, degrading the ability to encode binaural information.

Encoding of interaural phase with mismatched carriers

Changes in the carrier frequency often had a systematic effect on the phase of the encoding of envelope ITD. The phase shifts associated with varying the carrier frequencies at the ipsilateral and contralateral ears were typically opposite in direction, but were not consistent across neurons. Because different neurons showed phase shifts in different directions, the average difference across neurons was small. Such phase shifts were not reported in the previous physiological study with mismatched carriers (Yin et al. 1984). They presumably arise at least in part because of phase curvature associated with changes in speed of the traveling wave to frequencies above and below CF (e.g., van der Heijden and Joris 2006). However, differences between neurons are difficult to explain by peripheral mechanisms and, at present, the source of these differences is not clear.

Behavioral results

Psychophysical results collected on human listeners are broadly consistent with the physiological data. Whereas performance on the two-envelope ITD tasks fell off as a function of interaural carrier frequency separation, performance was above chance for separations up to 2 octaves or more. Tolerance of carrier frequency mismatches was stronger for the static than the dynamic task. The dynamic task could be viewed as more perceptually demanding than the static task, to the extent that it was based on the interaural envelope configuration as a function of time. It has been shown that temporal resolution of dynamic changes in ITD is relatively poor, leading some to characterize binaural processing as sluggish (e.g., Kollmeier and Gilkey 1990). No such sluggishness has been observed at the level of the IC (Yin and Kuwada 1983; Joris et al. 2006). Regardless of the fidelity with which dynamic cues can be processed, the duration of robust ITD presentation varied significantly across tasks: the ITD swept through all possible values in the dynamic task, whereas the maximal ITD was presented over the duration of the stimulus in the static task. The use of dynamic stimuli could have increased task difficulty and thus increased susceptibility to detrimental effects of carrier frequency mismatches in the psychophysical as compared to the physiological data.

Another potentially important difference between the static and the dynamic psychophysical paradigms is the availability of cues. Monaural discrimination of in-phase vs. out-of-phase modulation across frequency suggests that the static task can be performed by across-channel differences that do not rely on specifically binaural cues. For example, results are consistent with the possibility that listeners are performing an across-frequency envelope comparison (Strickland et al. 1989) in both the monaural and binaural versions of the static task. In contrast, the dynamic task cannot be performed reliably with monaural cues alone, supporting the conclusion that this task is more likely performed based on uniquely binaural cues underlying localization.

Summary

The present results show an ability to utilize interaural envelope phase timing information delivered with mismatched carrier frequencies bilaterally both in an animal neural model and in human listeners. Because of the limited spatial resolution achieved with current cochlear implant technology, inputs to binaural neurons from mismatched electrode contacts are likely to produce significant interactions even with wide separations because of the spread of current. The physiological data also suggest that control of intensity to avoid saturation effects associated with synchronization to the envelope is an important consideration for bilateral implant stimulation.