INTRODUCTION

Present-day cochlear prostheses can provide a high level of speech recognition in patients that have severe-to-profound hearing loss. Nearly all prostheses that are in clinical use employ multiple stimulating channels, where a channel refers to a signal pathway involving one or more active and return electrodes; present-day implants comprise between 8 and 22 intracochlear electrodes, with or without an extracochlear reference electrode. Multiple-channel cochlear prostheses permit substantially better speech recognition than do single-channel prostheses (Gantz et al. 1988; Cohen et al. 1993; Fishman et al. 1997), indicating that multiple-channel stimulation enhances transmission of speech-related information. Nevertheless, tests of speech recognition indicate that performance is not improved by increasing the number of activated channels beyond seven in quiet listening conditions (Fishman et al. 1997; Friesen et al. 2001) or beyond ten in the presence of background noise (Friesen et al. 2001). That finding suggests that the number of functionally independent channels often is less than the number of stimulated channels.

The earliest multichannel prostheses stimulated multiple electrodes simultaneously using continuous analog waveforms. The Compressed Analog (CA) strategy, for instance, processed sound with a bank of bandpass filters, compressed the filter outputs in amplitude to reduce the dynamic range, and then presented all the filter outputs simultaneously to multiple implant electrodes, one filter per electrode. A similar approach is used in the more recent Simultaneous Analog Signal (SAS) strategy (Zimmerman–Phillips and Murad 1999). Simultaneous stimulation of two or more nearby implant electrodes produces vector summation of electrical current fields, thereby resulting in functional interaction among channels. Wilson et al. (1991) developed a pulsatile stimulation strategy (Continuous Interleaved Sampling; CIS) that was intended to eliminate such channel interaction. In that strategy, each electrode was stimulated with an amplitude-modulated train of electrical pulses. Pulse trains on multiple electrodes were interleaved in time so that no two electrodes received simultaneous electrical currents. Most patients showed substantial improvement in speech recognition using the nonsimultaneous pulsatile CIS strategy compared with the simultaneous analog CA strategy (Wilson et al. 1991). Many contemporary speech processors for cochlear implants employ some form of interleaved pulsatile stimulation, although the SAS strategy is favored by some patients (Battmer et al. 1999; Osberger and Fisher 1999).

One interpretation of the success of patients in using an interleaved pulsatile strategy is that it eliminates direct electrical summation among multiple electrodes and reduces or eliminates interaction among the stimulated cochlear neural populations. In that way, the strategy increases the number of effectively independent channels of information that are transmitted to the brain. Nevertheless, CA, SAS, and CIS stimulation strategies have several other fundamental differences in addition to simultaneous versus nonsimultaneous stimulation, including continuous versus pulsatile waveforms and slower versus faster stimulation rates. For that reason, improved performance in the CIS strategy might not be attributable entirely to the nonsimultaneous stimulation. Moreover, several psychophysical studies have demonstrated that two or more channels, even when stimulated in an interleaved pattern, can show substantial interaction in regard to listeners’ reports of pitch (Townshend et al. 1987; McDermott and McKay 1994; McKay and McDermott 1996) or loudness (Shannon 1983; McKay et al. 2001). Therefore elimination of direct electrical summation does not entirely eliminate channel interaction.

We tested the hypothesis that increasing the temporal separation of stimulus pulses on multiple cochlear implant electrodes reduces interactions among channels. The design of clinical speech processors does not permit simple comparisons in patients of pulsatile stimulation in simultaneous versus nonsimultaneous configurations, nor have there been parametric studies in patients of the influence of interchannel timing on channel interaction. For that reason, we developed an animal model in which to examine channel interactions parametrically at a basic physiological level (Bierer and Middlebrooks 2002; Middlebrooks and Bierer 2002). We chose to monitor responses to cochlear stimulation by recording from the auditory cortex. The auditory cortex is arguably not the best place to study detailed biophysical mechanisms of channel interaction inasmuch as it is many synaptic levels removed from the site of electrical stimulation; auditory nerve recordings might be better suited for detailed biophysics. Nevertheless, study of the cortex offers several advantages. Cortical responses reflect the integrated activity of the ascending auditory pathway, thereby providing a sample of the neural activity that reaches the forebrain and likely contributes to an animal’s perception and behavior. Also, the straightforward tonotopic organization of the cortex aids in placing a recording site near the representation of the lowest-threshold site for any particular cochlear stimulus; in contrast, the tonotopy of the auditory nerve is less accessible, particularly in a deafened ear. Finally, electrical artifacts from cochlear stimulation complicate auditory nerve recordings, whereas cortical recordings are displaced from the electrical artifact by greater physical distance and by longer latency in the neural pathway.

We implanted anesthetized guinea pigs with arrays of six intracochlear electrodes and recorded neural spike activity from the auditory cortex with 16-channel recording probes oriented along the cortical tonotopic axis. The dependent variable was the cochlear current threshold that produced just-detectable elevation of cortical spike rates as determined by Receiver Operating Characteristic (ROC) analysis. We tested pairs of stimulated channels that varied in interchannel delay from 0 µs (i.e., simultaneous) to 2000 µs. We also tested the influence of spatial separation of electrodes and compared monopolar, bipolar, and tripolar electrode configurations.

As expected, interaction between implant channels generally was greatest when pulses were presented simultaneously on two channels, although there were exceptions in bipolar and tripolar conditions. Nevertheless, a single subthreshold pulse could have appreciable influence on the response to a pulse that trailed in time by up to 640 µs or more. The spatial and temporal spread of channel interaction varied considerably with electrode configuration.

METHODS

The basic procedures for cochlear stimulus presentation and multichannel cortical recording were similar to those in our previous work (Bierer and Middlebrooks 2002).

Anesthesia and surgery

Data were collected from 8 healthy adult pigmented guinea pigs (500–900 g). In each animal, intracochlear deafening, cochlear implantation, and cortical recordings were performed in one session lasting up to 20 h. Animals were anesthetized initially with an intramuscular injection of ketamine hydrochloride (40 mg/kg) and xylazine (10 mg/kg). Supplementary injections of a 9:1 mixture of ketamine and xylazine were used to maintain an areflexive state. The left ear was deafened by withdrawing perilymph from the basal cochlear turn, then infusing a 10% solution of neomycin sulfate into the scala tympani; more than 2 h passed between neomycin infusion and the beginning of data collection. We did not routinely test for residual hearing, but in our experience with other guinea pigs, that dose of neomycin eliminates sound-evoked auditory brainstem responses within a few minutes after treatment. Also, Nuttall et al. (1977) demonstrated that perilymphatic perfusion of a much lower concentration of neomycin sulfate (10 mM; ~0.6%) eliminated sound-evoked cochlear microphonics within 60 min.

A 6-electrode intracochlear array (Cochlear Corporation, Englewood, CO) was inserted into the scala tympani of the left ear through a cochleostomy. The electrodes were platinum–iridium bands centered at 750-µm intervals and were numbered 1–6 from base to apex. A ground wire was placed in a neck muscle. The right auditory cortex was exposed, a multielectrode recording probe (described below) was inserted with reference to surface landmarks, and the cortical surface was covered with agarose (20 mg agarose/ml Ringers solution).

All procedures were in accordance with the policies of the University of Michigan University Committee on Use and Care of Animals.

Stimulus generation

Experiments were controlled by a personal computer interfaced with Tucker-Davis hardware (Tucker-Davis Technologies, Gainesville, FL). Stimuli were controlled using custom software running in MATLAB (Mathworks Inc., Natick, MA). The signal to each active or return electrode was generated by one channel of an 8-channel digital-to analog converter (TDT DA8). Each channel was coupled to an intracortical electrode through an independent, custom-made, optically isolated current source with capacitance-coupled output. Experiments were conducted in a sound-attenuating chamber.

Stimuli consisted of single biphasic, charge-balanced pulses. Phase durations were 80 µs/phase. Pulses on active electrodes were initially cathodal except when stated otherwise. Three electrode configurations of the electrical stimulus were employed. In the monopolar configuration (MP), the active electrode was a single intrascalar electrode and the return was a wire positioned in a neck muscle. In the bipolar configuration (BP), the active electrode was one intrascalar electrode and the return was the adjacent, more apical intrascalar electrode. That configuration was referred to as BP+0 in our previous report (Bierer and Middlebrooks 2002), and for simplicity is referred to here as BP. In the tripolar configuration (TP), the active electrode was a single intrascalar electrode and the return consisted of the two adjacent electrodes (one on each side of the active electrode), each carrying half of the return current. Electrical field models and physical measurements predict that stimuli presented at a constant current level would produce increasingly more diffuse electrical fields as configurations were varied from TP to BP to MP (Kral et al. 1998; Jolly et al. 1996; Spelman et al. 1995). Absolute current levels are expressed as peak current in decibels (dB) relative to 1 mA.

A stimulus channel in this report refers to a signal pathway originating in the digital-to-analog converter and terminating in an active electrode and its complement of return electrode(s). Channel number corresponds to the number of the active electrode; BP channel 2, for instance, consists of active electrode 2 and more apical return electrode 3. On a given trial, either one or two cochlear implant channels were stimulated. In this study, the more apical channel always was channel 5, and the more basal channel was either channel 2 or 3, as indicated. The separation between active electrodes was 1.5 or 2.25 mm when the basal channel was channel 3 or 2, respectively. In two-channel conditions, the same electrode configuration was used on both channels.

Current levels were varied over a range from below threshold to 5 to 15 dB above threshold in either 1- or 2-dB steps. In each block of trials, electrode configurations were held constant, and stimulus current level, active electrode separation, and temporal offset were varied among trials. Every combination of current level, active electrode separation, and temporal offset was presented once in random order, then every combination was repeated in a different random order until each stimulus combination was tested 10 times.

Interchannel temporal offsets were expressed as the time from the onset of the first pulse to the onset of the second pulse, as shown in Figure 1. Temporal offsets ranged from 0 (simultaneous) to 2000 µs. We chose 2000 µs as the longest temporal offset because that is the longest offset that is possible given two channels stimulated at 250 pulses/s, as in the Spectral Peak (SPEAK) speech-processing strategy (Skinner et al. 1994). Same-phase stimulation of two channels indicates that pulses on both active electrodes were presented with the cathodic phase first (left panels of Fig. 1), whereas inverted-phase stimulation indicates that the pulse on one active electrode was initially anodic and the pulse on the other active electrode was initially cathodic (right panels of Fig. 1).

Figure 1
figure 1

Schematics of two-channel stimuli. Each channel was stimulated with a biphasic pulse of duration 80 µs/phase, represented by the solid line in each panel. The broken lines represent the stimulus on the more apical channel. Temporal offsets were measured from the beginning of the first pulse to the beginning of the second. In each panel, temporal offsets of 0, 160, 320, and 2000 µs, respectively, are shown. Top and bottom panels represent the same- and inverted-polarity conditions.

Multichannel recording and spike sorting

Cortical activity was recorded with silicon-substrate multichannel recording probes (Center for Neural Communication Technology, Ann Arbor, MI; Drake et al. 1988; Najafi et al. 1985). Each recording probe had 16 recording sites along a single shank at intervals of 100 µm (center to center). The shank was 15 µm thick and 100 µm wide, tapering in width from 100 µm to 15 µm over the segment containing the recording sites. The multichannel recording probe permitted recording of spike activity simultaneously from 16 cortical sites.

Recording probes penetrated the primary auditory cortex in the right hemisphere, from caudodorsal to rostroventral and roughly parallel to the cortical surface. We attempted to position the probe in the middle cortical layers, aligned with the cochleotopic gradient along which the representation of cochlear place of stimulation changes most rapidly. In our previous study in which we used these probes to study cortical responses to tones, the 16 recording sites typically spanned about 2–3 octaves of the tonotopic frequency representation (Arenberg et al. 2000). In the guinea pig area A1, neurons sensitive to basal cochlea stimulation (high frequencies) are situated dorsocaudally, and apical cochlea (low frequencies) are situated ventrorostrally (Hellweg et al. 1977; Redies et al. 1989; Arenberg et al. 2000; Wallace et al. 2000; Bierer and Middlebrooks 2002). Prior to detailed study at each probe position, tuning properties of rostral and caudal cortical sites were estimated by observing responses to BP stimuli on the most apical and basal stimulus channels. If the reverse cochleotopic order was detected, indicative of the dorsocaudal field (area DC; Redies et al. 1989), the probe was retracted and placed further ventral and rostral in area A1.

Signals measured from the recording probe were amplified with a custom 16-channel amplifier, digitized at a 25-kHz rate, filtered, and then stored on the computer hard disk. Unit activity was isolated from the digitized signal offline using custom spike-sorting software (Furukawa et al. 2000). Spike times were stored at 20-µs resolution for further analysis. We sometimes encountered well-isolated single units, but most recordings were of unresolved clusters of a small number of units.

Data analysis

Detailed measurements of cortical activation patterns were obtained from one probe placement in each of 8 guinea pigs. At each probe placement, stable recordings of single and multiunit clusters were obtained at 16 recording sites for a total of 128 recording sites.

Threshold current levels based on cortical spike counts were determined by using procedures from Signal Detection Theory (Green and Swetts 1966). In conditions in which only one channel was stimulated, ROC curves were computed from spike counts obtained on trials in which a stimulus was or was not present. In two-channel stimulation conditions, a fixed current level was present on one channel on all trials, and the ROC curve was computed from trials in which a stimulus on the second channel was or was not present. The area under an ROC curve was converted to a discrimination index (d′). Values of d′ were computed for current levels tested in 1- or 2-dB steps. Linear interpolation was used to estimate the current level corresponding to d′ = 1, which was taken as the threshold. We regard this as a fairly conservative estimate of threshold, compared to analysis of simple mean spike rates, inasmuch as the ROC analysis incorporated the trial-by-trial variability of spike counts on stimulus and nonstimulus trials. We designate the threshold for channel i by Θ i . Channel interactions were quantified by threshold shifts, ΔΘ, which designate the changes in the threshold for channels i resulting from a constant-level stimulus on channel j. Threshold shifts were expressed in dB as the channel i threshold in the presence of a channel j stimulus minus the threshold in the absence of channel j: ΔΘ = Θ i+j − Θ i . Negative threshold shifts designate reductions in channel i thresholds resulting from stimulation of channel j.

The cortical image of any particular stimulus (as in Fig. 2) was represented by the distribution of stimulus-driven cortical activity across all recording sites and across poststimulus time (Bierer and Middlebrooks 2002). Cortical images were derived from simultaneous recordings at 16 cortical sites averaged across 10 trials. For the purpose of computing cortical images, spike rates were normalized at each recording site. That was accomplished by computing at each recording site the mean spike rate for each condition of stimulus channel, current level, and temporal offset, then taking the 5th and 95th percentile of the distribution of mean spike rates as the spontaneous rate and maximum rate, respectively. The range of spike rates between those rates was used to compute a normalization factor for each site. That normalization method emphasized stimulus-driven changes in activity rather than absolute spike numbers across channels. Cortical images are illustrated using contour plots (Fig. 2), which were drawn using the contour function in MATLAB. As was done previously, the centroid of the cortical image was defined as the normalized-spike-rate-weighted center of mass calculated from all the sites at which the firing rate was above threshold; the centroid computation collapsed spike rates across all time bins (Bierer and Middlebrooks 2002).

Figure 2
figure 2

Cortical images of one- and two-channel stimuli. Each panel represents the cortical image of one stimulus averaged across 10 trials. The left two columns represent responses to single-channel stimulation of channels 5 (first column) and 2 (second column). Current levels are expressed in dB re: 1 mA. Columns 3, 4, and 5 represent two-channel stimulation in the simultaneous condition with the current level on channel 2 fixed at 2, 1, and 0 dB, respectively, below the threshold for channel 2 alone (i.e., Θ2−2, Θ2−1, and Θ2). The rightmost column represents a condition in which the stimulus level on channel 2 was fixed at Θ2 with a temporal offset of 160 µs. For each cortical image, the abscissa represents poststimulus time and the ordinate represents cortical place relative to the most caudal recording site. Contours represent mean spike counts expressed as percent of the maximum count on each cortical channel; contours are drawn at 20, 40, 60, and 80% of the maximum count. Triangles to the right of each panel represent the centroid locations. Data are from animal GP02.

Statistical comparisons of threshold shifts between pairs of conditions were made using a nonparametric sign test. The nonparametric test was chosen because in many cases the threshold shift in one of a pair of conditions was so large that the magnitude of the shift could not be determined, even though the lower bound of the shift was measured to be larger than the threshold shift measured in the other condition.

RESULTS

We refer to the characteristic spatiotemporal distribution of cortical spike activity elicited by a particular stimulus at one particular level as the cortical image of that stimulus. We begin by describing the cortical images of one- and two-channel cochlear implant stimuli. Then we describe spike-rate-versus-current-level functions, showing that sub-threshold stimulation of one channel could reduce the threshold for a second channel. We evaluate these threshold shifts at various cortical sites relative to the tonotopic representations of the two stimulus channels. Finally, we test the sensitivity of threshold shifts to the relative cochlear spacing, temporal offset, and relative polarity of two-channel stimuli.

Cortical images of one- and two-channel stimuli

Figure 2 presents examples of cortical images of one- and two-channel cochlear implant stimuli in the bipolar (BP) electrode configuration. The vertical columns of panels represent, from left to right, the cortical images of stimuli presented as follows: (1) to channel 5 alone; (2) to channel 2 alone; (3) to channel 5 with channel 2 stimulated simultaneously at Θ2 − 2 dB; (4) to channel 5 with channel 2 stimulated simultaneously at Θ2 − 1 dB; (5) to channel 5 with channel 2 stimulated simultaneously at Θ2 dB; and (6) to channel 5 with channel 2 stimulated at Θ2 dB with a 160-µs temporal offset. Each horizontal row of panels represents the response at a fixed current level on channel 5, except for the second column, which shows responses to stimulation of channel 2 alone. In each panel, the vertical axis represents the location along the cortical recording probe relative to the most caudal recording site, the horizontal axis shows time after the stimulus onset, and the contours represent the normalized spike rate at each recording site, increasing from 20% to 80% of maximum activity in steps of 20%. Single-channel stimulation produced spatially restricted foci of cortical activity at threshold levels (third row, left two columns). Centroids of the cortical images of near-threshold stimuli on channels 5 and 2 were located at 602 and 314 µm, respectively. The relative locations of those centroids were consistent with the known cochleotopic organization of the primary auditory cortex of the guinea pig in that the cortical image of the more apical electrode pair (channel 5) was located further rostral than that of the basal electrode pair (channel 2; Bierer and Middlebrooks 2002). As the current level was increased to the highest tested levels (bottom row), cortical images broadened in cortical extent to cover the entire recording array.

The stimulus current levels on channel 5 were constant across each row of cortical images in Figure 2 (except for column 2, in which channel 5 was not stimulated). When channels 2 and 5 were stimulated simultaneously (columns 3, 4, and 5), robust cortical responses were elicited at levels 2 dB or more below the thresholds for each channel stimulated alone. At a near-threshold channel 5 level (i.e., row 3), addition of a pulse on channel 2 at Θ2 − 2 dB produced a cortical image that encompassed both of the cortical areas that were activated by each channel individually. The centroid for low-level simultaneous stimulation of channels 2 and 5 was located between the centroids of the two single-channel centroids.

When stimulation of channel 2 preceded stimulation of channel 5 by 160 µs (rightmost column), cortical responses were elicited at the second level shown, about 1 dB below the threshold for stimulation of channel 5 alone. That observation indicates that the influence of channel 2 on channel 5 persisted for temporal offsets at least as long as 160 µs after the offset of the channel 5 stimulus.

In summary, the differences between two-channel cortical images and the constituent single-channel stimuli include (1) a reduction of cortical response threshold for simultaneous or nonsimultaneous two-channel stimulation, (2) a shift of the cortical centroid of activity to a location intermediate to the centroids of activity elicited by the two single-channel stimuli, and (3) an extent of cortical activation greater than the area encompassed by the responses to each of the two single-channel stimuli individually.

Threshold shifts

Spike rates at nearly every recording site increased monotonically or increased to a plateau as current levels were increased. Figure 3 shows examples of spike-rate-versus-current-level functions at three recording sites, 300, 800, and 1100 µm relative to the caudalmost site on the recording probe. Each column of panels represents one recording site, and top, middle, and bottom rows of panels represent temporal offsets of 0, 160, and 320 µs, respectively. Lines marked with circles indicate rate-level functions obtained with stimulation of channel 5 alone; within each column, the differences among rows of panels in the channel-5-alone rate-level functions demonstrate variability among repeated measurements. The threshold for stimulation of channel 3 alone, Θ3, was determined as described in the Data Analysis subsection. The lines marked with squares or asterisks represent the rate-level functions obtained for stimulation of channel 5 while the level on channel 3 was held constant at Θ3 or Θ3 − 1 dB, respectively. In every panel in the figure, stimulation of channel 3 increased the spike rate in response to the channel 5 stimulus, with the effect of displacing rate-level functions to the left. Filled arrowheads on the abscissa indicate thresholds determined for the channel-5-alone and two-channel conditions (Θ5 and Θ5+3, respectively). We designate a reduction in current threshold for channel 5 resulting from addition of the channel 3 stimulus by a negative threshold shift. In some cases the threshold shift was so great that we failed to test sufficiently low levels on channel 5, as in the upper-left panel in Figure 3. In that case, the magnitude of the measured threshold shift must be regarded as a lower bound.

Figure 3
figure 3

Rate-versus-level functions for three recording sites and three temporal offsets. Each column of panels represents responses obtained from one recording site, 300, 800, or 1100 µm relative to the most caudal site. Rows of panels represent responses to simultaneous (top), 160-µs offset (middle) and 320-µs offset (bottom). For each panel, the abscissa represents the stimulus current level delivered to channel 5 and the ordinate represents the normalized spike rate. Rate-level functions are drawn for stimulation of channel 5 alone (circles) or with the current level on channel 3 constant at its threshold of −6 dB re: 1 mA (squares) or constant at −7 dB re: 1 mA (asterisks). Filled triangles indicate threshold current levels in the one- and two-channel conditions. Data are from animal GP43.

In the examples illustrated in Figure 3, stimulation of channel 3 at its threshold level simultaneously with channel 5 stimulation (top row) resulted in threshold shifts ranging from −6.1 to −7.2 dB. When the channel 5 stimulus was delayed relative to channel 3 (middle and bottom rows), the magnitudes of threshold shifts decreased, but threshold shifts of a few dB were observed even when the temporal offset was 320 µs. Note that even a threshold shift as small as 1 or 2 dB is likely to be of some importance given that the dynamic ranges over which rate-level functions increased in the single-channel condition tended to be no broader than a few dB.

The magnitudes of threshold shifts tended to vary among cortical recording sites. For instance, in the example in Figure 3, threshold shifts were greatest at the 300-µm cortical site and decreased at the 800- and 1100-µm sites. The dependence of the magnitudes of threshold shifts on the locations of cortical recording sites is considered later.

Figure 4 shows examples of threshold shifts across the 16-channel recording array. Columns of panels represent MP, BP, and TP electrode configurations, and rows represent temporal offsets of 0, 160, and 320 µs. The data points represent shifts in the threshold for channel 5 stimulation resulting from stimulation of channel 2 at Θ2 − 1 dB (circles) or Θ2 −2 dB (squares); filled symbols indicate cases in which the threshold shift was greater than could be computed from the measured data. The largest shifts in the channel 5 threshold were measured in response to simultaneous stimulation of the two channels (top row). In the MP stimulation case, all recording sites and both channel 2 levels showed threshold reductions of 8 dB or more; the threshold shifts recorded on the caudal half of the recording array were greater than could be computed from recorded data. In the BP case, threshold shifts were considerably smaller, but also varied with cortical place. With channel 2 fixed at Θ2 − 1 dB, caudal recording sites showed threshold shifts around −2 dB, whereas there was no little or no threshold shift recorded at rostral sites. In the TP case, a threshold shift was recorded only at the most caudal sites. Data points are missing in the TP condition at sites at which the cortical response was too weak to determine a reliable threshold; this is expected from the restricted cortical images of TP stimuli (Bierer and Middlebrooks 2002). The magnitude of threshold shifts in the TP configuration was rather variable across animals but was negligible in many cases.

Figure 4
figure 4

Threshold shifts across cortical recording sites for three stimulus configurations and three temporal offsets. The left, middle, and right columns represent threshold shifts for MP, BP, andTP configurations, respectively. The abscissa and ordinate are cortical place along the recording electrode and threshold shift, respectively. Threshold shifts indicate thresholds in the presence of a channel 2 stimulus relative to the threshold in the channel-5-alone condition. Circles and squares indicate conditions in which the channel 2 stimulus was presented at levels 1 or 2 dB below the threshold for response to channel 2 alone. Filled symbols indicate instances at which the lowest level tested was above threshold, i.e., the true threshold was lower than the plotted value. The arrows at the lower edge of each plot indicate the locations of cortical centroids for channels 2 and 5. Data are from animal GP02.

In the illustrated example, addition of a temporal offset of 160 or 320 µs essentially abolished the threshold shifts in the BP and TP configurations. In the MP configuration, threshold shifts in the presence of temporal offsets were restricted to cortical sites near the centroid of the cortical image of channel-2-alone stimuli (indicated by triangles on the abscissas).

The example illustrated in Figure 4 is representative of all eight cases in the following respects: The largest threshold shifts were observed in the simultaneous MP configuration. In the nonsimultaneous MP conditions, the magnitude of threshold shifts tended to decrease with increasing temporal offset. The dependence on temporal offset was more variable in the BP and TP configurations, sometimes showing little or no shift at any temporal offset (see the following subsection). Within each of the simultaneous MP and BP conditions, the largest threshold shifts tended to be recorded at cortical sites near the centroid of the cortical image of the near-threshold fixed-level stimulus, in this case channel 2. Threshold shifts in the TP configuration generally were smaller than for MP or BP and were entirely absent in some cases.

Shifts in threshold tended to be greatest in magnitude at cortical sites near the centroid of the cortical image of the near-threshold fixed-level channel (channel 2 or 3 in the previous examples). That can be seen in the previous example (shown in Fig. 4), in which the triangles along the abscissa indicated the locations of cortical centroids for channels 2 and 5. Figure 5 shows examples of the threshold shift measured near the centroid of the fixed-level channel compared with that near the channel for which the threshold was measured (e.g., channels 2 and 5, respectively, for the examples shown in Figs. 3 and 4). Data are shown for the condition in which the current on channel 2 or 3 was fixed at 1 dB below the threshold for that channel. Threshold shifts, collapsed across MP, BP, and TP configurations, were significantly greater in magnitude at the cortical centroid of the fixed-level channel than at the centroid of the channel for which thresholds were measured; that comparison was significant (p < 0.005) for temporal offsets of 0, 160, and 320 µs, but not significant for 640 µs (data not shown), both with the more basal channel or the more apical channel stimulated at a fixed level. That result implies that a subthreshold pulse at a particular cochlear site increased the sensitivity at that site to a simultaneous or later pulse presented at another cochlear site.

Figure 5
figure 5

Differences between threshold shifts measured at the channel 2 or channel 3 centroid compared with that measured at the channel 5 centroid. Points falling above the positive diagonal indicate threshold shifts were greater for sites near the channel 2 or channel 3 centroid. The distribution is collapsed across active electrode separations of 1.5 and 2.25 mm.

We anticipated that threshold shifts would be sensitive to the separation between the two active intracochlear electrodes. That hypothesis was difficult to test in this study because of the limited number of active electrode separations that could be implemented on a 6-electrode cochlear implant; for instance, only 2 nonoverlapping TP channels could be implemented. Figure 6 compares the threshold shifts for 1.5 and 2.25-mm cochlear active electrode separations in the BP electrode configuration; left and right columns of panels show conditions in which the more basal channel (left) or more apical channel (right) was fixed at a near-threshold level. In the BP configuration, threshold shifts were significantly greater in magnitude (p < 0.001) for 1.5- than for 2.25-mm active electrode separations; the difference was significant for all temporal offsets from 0 to 640 µs and for conditions in which either the more basal or more apical channel was fixed in level. The MP configuration showed no consistent difference between 1.5- and 2.25-mm active electrode separations for either temporal offset (data not shown), presumably because there was so much overlap in electrical fields between pairs of electrodes. These findings indicate that channel interaction with the BP configuration is influenced by the distance between the two active electrodes of the two-channel stimuli, whereas channel interaction with the MP configuration was great for both tested active electrode separations.

Figure 6
figure 6

Threshold shifts for two BP channel separations. Each panel shows the threshold shifts measured in response to two-channel stimulation with a channel separation of 2.25 mm compared with that of 1.5 mm. Symbols represent every recording channel in which thresholds could be computed for both separations (i.e., 8 animals, as many as 16 channels per subject). The left column of panels shows threshold shifts in the condition in which the current on the more basal channel (i.e., channel 2 or 3) was fixed at 1 dB below its threshold, and the right column of panels shows threshold shifts for the condition in which the current on the more apical channel was fixed. Upper and lower rows of panels represent simultaneous and 160-µs offset conditions. Points falling below the positive diagonal indicate threshold shifts that were greater for smaller active electrode separations.

Temporal offset between two channels of stimulation

As shown in the previous examples, threshold shifts most often decreased with increasing temporal offset between pulses on two channels. Figure 7 shows threshold shifts as a function of temporal offset for 8 animals and 3 electrode configurations. Thresholds were measured for channel 5 in the presence of a stimulus on channel 2 or 3 fixed at 1 dB below the channel 2 or channel 3 threshold; the plotted threshold shifts were measured at the centroid for channel 2 or 3. Threshold shifts observed for simultaneous and nonsimultaneous conditions are shown with filled and open symbols, respectively. In the MP configuration, threshold shifts consistently were largest in the simultaneous condition; they were rather variable in the simultaneous BP and TP conditions. Threshold shifts in the nonsimultaneous conditions generally declined with increases in the temporal offset. Data from the MP configuration showed a roughly logarithmic dependence, declining at a rate of 0.5 dB per doubling of temporal offset (range across 7 animals: 0.3–1.0). Again, the results were more variable for the BP and TP configurations. A subthreshold pulse on one channel in some cases lowered the threshold of a pulse on a second channel that followed in time by up to 640 µs.

Figure 7
figure 7

Threshold shifts as a function of temporal offset. Each panel shows data from one animal, indicated by the number in the lower right corner. Symbol shapes represent electrode configuration. The filled symbols represent responses to the simultaneous condition. The open symbols represent threshold shifts measured in response to varying temporal offsets from 160 to 2000 µs. The lines show the least-squared fits to the nonsimultaneous MP data (open squares). In each case, threshold shifts were measured at the cortical site closest to the fixed-level (channel 2 or 3) centroid.

The data presented above represent conditions in which thresholds were measured for a test pulse that was simultaneous with or was preceded by a near-threshold fixed-level pulse. We also tested conditions in which the fixed-level stimulus followed the test pulse for which the threshold was measured. Figure 8 compares threshold shifts under conditions in which the fixed-level channel led or trailed the channel for which the threshold shift was measured. The three panels show data for temporal offsets of 160, 640, and 2000 µs. Data are combined across all animals, recording sites, electrode configurations, active electrode separations, and fixed levels (Θ2,3 − 1 and Θ2,3 − 2 dB). For temporal offsets up to 640 µs, there was a small but significant tendency for threshold shifts to be larger when the fixed-level channel was first (mean differences generally <1 dB; p < 0.001 for all conditions except for 640-µs offset with the more apical channel fixed: p > 0.05). The effect of temporal order was reversed for longer temporal offsets. A near-threshold pulse on one channel tended to elevate the threshold of a pulse on a second channel that followed by 2000 µs (p < 0.001).

Figure 8
figure 8

Sensitivity to the order of fixed- and variable-level channels. The plots compare threshold shifts in conditions in which the pulse on the fixed-level channel led (horizontal coordinate) or lagged (vertical coordinate) the pulse on the channel that was varied in current level. Data are compiled across all animals, configurations, and active electrode separations.

Polarity inversion between channels

We tested in four animals a condition in which the polarity of the electrical current was inverted between the two stimulus channels. That is, in the inverted-polarity condition, the biphasic pulse on electrode 5 was initially cathodal and the biphasic pulse on electrode 2 or 3 was initially anodal. Examples from one representative animal are shown in Figure 9. The inverted-polarity simultaneous MP condition was unlike every other simultaneous condition in this study in that thresholds were elevated by around 4 dB at all cortical sites in this example. The MP condition with a 160-µs temporal offset, in contrast, showed a particularly strong threshold reduction. Other conditions in this animal produced threshold reductions of various magnitudes. The results from the illustrated example were consistent across inverted-polarity conditions in all animals in that the simultaneous MP condition produced a threshold elevation (range among animals, measured at the cortical centroid of the fixed-level channel: +3.2 to +4.7 dB, median = 4.1 dB), whereas any threshold shift in any of the other conditions was a threshold reduction. Threshold reductions in the MP 160-µs-temporal-offset condition were most reliably large, ranging from −0.7 to −6.7 dB (median = −6.4 dB). Threshold shifts in other MP conditions and all BP and TP conditions ranged among animals from +0.5 to −4.8 dB (median = 0.7 dB). Possible mechanisms for threshold elevations and reductions in the inverted-polarity conditions are considered in the Discussion.

Figure 9
figure 9

Threshold shifts across cortical recording sites for inverted-phase stimulation for MP, BP, and TP electrode configurations and three temporal offsets. Conventions are as in Fig. 4. Data are from animal GP38.

DISCUSSION

The results of the present study demonstrate that the current level required to elicit cortical responses by a single cochlear implant channel was influenced by the presence of threshold or subthreshold activity on a second channel. The magnitude and direction of that influence was dependent on several factors: (1) the electrode configuration, (2) the spatial separation of the two active electrodes, (3) the relative timing of the two stimuli, and (4) the relative polarity. Conversely, there was only a weak dependence in the order of channels (for temporal offsets up to ~640 µs) or on the relative apical and basal locations of fixed- and varying-level channels. These results provide insight into possible mechanisms of channel interaction and have implications for the design of speech-processing strategies and/or electrode design for cochlear prostheses.

Relation to previous studies

There have been few previous physiological studies of responses to multiple cochlear stimulating electrodes. In one early study, Merzenich and White (1977) recorded responses of neurons in the cat inferior colliculus to stimulation of two intracochlear electrodes. When the electrodes were stimulated simultaneously, the collicular responses were substantially greater in a same-polarity condition compared with an inverted-polarity condition. Contrary to the present results, however, when pulses were delivered in phase through the two electrodes nonsimultaneously (offset by as little as 75 µs), a pulse on one electrode had no effect on the threshold for another electrode.

One can gain some understanding of responses to nonsimultaneous pulses on two nearby electrodes by examining the responses to pairs of pulses presented on a single electrode. Cartee et al. (2000) recorded cat auditory nerve responses to paired pulses. Their electrical stimuli were pseudomonophasic, in contrast to our biphasic pulses, and were presented through an electrode inserted into the internal auditory meatus, compared with our intrascalar electrodes. Paired-pulse summation was observed, meaning that a subthreshold leading pulse resulted in an increased probability of a response to a following pulse. The summation decreased with increasing interpulse interval with a time constant of 147 µs. The summation time constant appears to be shorter for a meatal stimulation site compared with an intracochlear site. Cartee et al. (2000) estimated an intracochlear summation time constant of 504 µs from data shown by Dynes (1996). The many methodological differences preclude close comparison with the present results, but summation time constants of a few hundreds of microseconds are of the same order of magnitude as the sensitivity to temporal offsets that we observed.

The present physiological results can be compared with published human psychophysical results. Relevant psychophysical results are available from studies of detection thresholds, loudness summation, and pitch perception. White et al. (1984) measured detection thresholds for pairs of biphasic pulses that were presented simultaneously, one on each of two intracochlear electrodes. They compared thresholds in same- or inverted-phase conditions as an indication of the magnitude of channel interactions. Consistent with the present physiological results, they found that channel interactions increased with increasing proximity of electrodes and were greater in a MP configuration than in a BP electrode configuration.

One common psychophysical measure of channel interaction is based on subjects’ estimates of loudness (Shannon 1983; White et al. 1984; Tong and Clark 1986; McKay and McDermott 1996; McKay et al. 2001). In such procedures, subjects compare the loudness of a two-channel stimulus with that of a one-channel stimulus. That a two-channel stimulus is usually perceived as louder is referred to as loudness summation. We must use some care in comparing measures of loudness summation with our measures of threshold reduction because loudness summation involves estimates of the perceived magnitudes of suprathreshold stimuli, whereas our measurements were made around threshold. Nevertheless, studies of loudness summation provide some indication of the parameters that influence channel interactions.

Shannon (1983) tested loudness summation using simultaneously presented sinusoidal electric stimuli. He found that loudness summation tended to follow expectations based on simple vector summation of electrical fields: loudness was increased in same-polarity stimulus conditions and decreased in inverted-polarity conditions. The MP electrode configuration produced greater spatial overlap and greater loudness summation than did the BP configuration, as in the present results. White et al. (1984) tested loudness summation in response to single biphasic pulses presented nonsimultaneously, one to each of two channels. There were considerable intersubject differences, but subjects’ loudness judgments generally were sensitive to interpulse temporal offsets of up to 5 ms.

Tong and Clark (1986) tested loudness summation using interleaved pulse trains. They found that loudness tended to increase with increasing cochlear separation between active cochlear electrodes. That result is consistent with results from acoustical stimulation studies that show that loudness tends to increase with increasing separation in frequency between two tones (Plomp 1976). One interpretation of the Tong and Clark results is that the increased loudness is a consequence of increased spread of excitation in central structures resulting from increased spread of excitation in the cochlear nerve. Consistent with that view, the present physiological results showed that the cortical images of two-channel stimuli encompassed the images of either channel alone (e.g., Fig. 2). We note, however, the possibility of other models of loudness growth that do not involve spread of cochlear nerve excitation (Zeng and Turner 1991; Hellman 1994).

McKay et al. (2001) tested loudness summation using interleaved pulse trains. They found a rather complicated interaction among several factors including active electrode separation, pulse rate, stimulus level, and electrode configuration. All the loudness matches in that study were performed at the midpoint or the maximum of the dynamic range. At the lower level, but not the higher, the closest active electrode separation (0.75 mm) produced greater loudness summation than did 2.25-mm or greater separations. The dependent variable in the present study was detection threshold, which may be regarded as the lowest point in the neural dynamic range. At that level, we observed greater threshold reduction with 1.5-mm than with 2.25-mm active electrode separations, but only in the BP configuration. Note that we could test only a limited range of electrode separations because of the limited number of electrodes in our guinea pig cochlear implant.

Channel interaction also can influence pitch perception. Pitch perception in cochlear implant studies refers to the perceptual dimension in which subjects rank successive cochlear implant channels. Both simultaneous and nonsimultaneous activation of two cochlear implant channels have been shown to elicit a single pitch that is intermediate to the pitch of the two channels stimulated individually. An intermediate pitch percept was perceived by subjects using either broad (MP) or more restricted electrode configurations (BP and common ground) (MP: Townshend et al. 1987; BP/common ground: McDermott and McKay 1994; McKay and McDermott 1996). The ratio of currents delivered to each stimulus channel influenced the location of the intermediate pitch—as the current level on channel A was increased, the intermediate pitch approached that of channel A, and vice versa for channel B (Townshend et al. 1987; McDermott and McKay 1994). In our study, the centroids of our cortical images varied systematically with the cochlear place of stimulation (Bierer and Middlebrooks 2002; Middlebrooks and Bierer 2002), presumably in analogy with psychophysical pitch perception. Cortical images of two-channel stimuli showed a broad single peak encompassing the images of the two component stimulus channels. That might be analogous to the intermediate pitch that is reported by human implant users.

Possible mechanisms of channel interaction

Interactions between stimulated channels are likely to take place at multiple levels of the auditory system, from the cochlea to the cortex. In the central nervous system, studies that use acoustic stimuli show interchannel interaction in the form of lateral (or two-tone) inhibition. Two-tone inhibition has been demonstrated in the cochlear nucleus (e.g., Young 1998), the inferior colliculus (e.g., Ramachandran et al. 1999), and the auditory cortex (e.g., Shamma et al. 1993). In our studies, even the most restricted electrical stimuli (i.e., TP configuration) activate cortical regions as large as those activated in normal-hearing guinea pigs by 1-octave-wide noise bands (Arenberg et al. 2000; Bierer and Middlebrooks 2002; present study). That result indicates that all of our electrical stimuli are likely to activate inhibitory sidebands of neuronal frequency response areas. Indeed, lateral inhibition probably shapes all the single-channel cortical images that we record (Bierer and Middlebrooks 2002). Two-channel stimulation presumably further activates the lateral inhibitory surround areas. Nevertheless, lateral inhibition would suppress responses, not enhance them, as is demonstrated by reductions in thresholds. If lateral inhibition contributes to the channel interactions observed in the present study, it must be less prominent than other facilitatory factors.

The threshold shifts that we observed were most likely dominated by channel interaction that occurred within the cochlea. In the cochlea, we must consider direct vector summation of electrical stimuli as well as residual effects from charge stored on membranes and from activation of voltage-gated ion channels. Direct summation would have occurred in conditions in which there was no temporal offset between channels. Simple addition would predict that simultaneous in-phase stimulation of two nearby electrodes at equal levels would produce a current roughly 6 dB greater than the current from either electrode alone, although the exact value of the current increment would depend on details of local current paths, electrode impedances, and other factors. The expectation of current summation was borne out by the observed threshold shifts (e.g., Fig. 3). Threshold shifts were greater in the MP configuration than in the BP or the TP condition, presumably because greater cochlear current spread produced by the MP configuration resulted in greater overlap of current fields from the two electrodes. Current summation in the BP and TP configurations also is complicated by complexities of local field geometries resulting from multiple intracochlear active and return electrodes. Simultaneous stimulation with multiple BP and TP channels is worthy of further empirical and theoretical study.

The inverted-polarity condition represents a special case in which the observed results appear to reflect multiple complex geometries of current paths. We consistently observed an elevation of thresholds in the inverted-polarity simultaneous MP condition. In principle, one could model that condition as a vector subtraction of currents, although such a model would be complicated by nonhomogeneities in current paths. Intuitively, however, it is useful to regard this condition as reducing the net current flow from intracochlear source electrodes to excitable neural elements while increasing current flow from one intracochlear electrode to the other by way of a low-impedance path through the perilymph. In effect, the two intracochlear electrodes in opposite polarity would function to some extent as a single bipolar pair, which we have shown to have a higher threshold than a monopolar electrode (Bierer and Middlebrooks 2002). All other inverted-polarity conditions tended to produce threshold reductions, although the likely mechanisms differ among conditions. In the condition of simultaneous BP stimulation of channels 2 and 5, the return electrode of channel 2 (i.e., electrode 3) and the active electrode of channel 5 (electrode 5) were initially cathodal (and electrodes 2 and 6 were initially anodal), so one can think of that configuration as consisting of a double-sized active electrode (i.e., from electrodes 3 and 5) flanked by adjacent electrodes 2 and 6. In the case of a temporal offset of 160 µs, the inverted-polarity condition effectively elongated the stimulus phase duration. That is, the cathodal second phase of the first biphasic pulse continued into the cathodal first phase of the trailing pulse. In our previous study (Bierer and Middlebrooks 2002), we found that cortical thresholds tended to be reduced by an average of 4.7 dB per doubling of phase duration, which is generally consistent with the present results from the 160-µs condition.

Channel interactions in same-polarity nonsimultaneous conditions (and inverted-polarity conditions with temporal offsets greater than 160 µs) must reflect a residual influence of the leading pulse on a trailing pulse. Residual effects could include stimulus current integrated by the resistance and capacitance of neural membranes and the activation of voltage-sensitive ion currents. Both of these factors presumably contribute to the time constant for summation of cochlear stimuli as measured with paired-pulse summation (e.g., Cartee et al. 2000) or with measures of chronaxie, which is the threshold duration of a pulse that is twice the amplitude of a threshold pulse of long duration (Loeb et al. 1983). Various investigators have reported such time constants in the range of roughly 150–500 µs (Loeb et al. 1983; Cartee et al. 2000; van den Honert and Stypulkowski 1984; Dynes 1996). In the present study, for temporal offsets up to ~640 µs, a subthreshold depolarization caused by a trailing pulse would have added to depolarization remaining from the leading pulse, thus reducing the threshold for activation by the trailing pulse. The longest temporal offsets used in the present study (2000 µs) were longer than the range of reported time constants for cochlear electrical summation. We found that a subthreshold leading pulse tended to elevate slightly the threshold for a pulse that trailed by 2000 µs. One possible explanation for that observation is that the leading pulse would have had time to trigger activation of voltage-sensitive potassium channels and inactivation of voltage-sensitive sodium channels, resulting in a partial refractory state and elevating the threshold for the trailing pulse (Hille 2001). Published time constants for refraction for scala tympani stimulation are around 1.5 ms (Dynes 1986; Javel et al. 1987; Parkins 1989; Bruce et al. 1999) and 0.7 ms for meatal stimulation (Cartee et al. 2000).

Implications for speech-processor design

Speech-processing strategies that are in use for cochlear implants can be divided into simultaneous analog and interleaved pulsatile strategies; recent designs also have presented pulsatile stimuli simultaneously to widely separated pairs of channels (Zimmerman-Phillips and Murad 1999). In simultaneous analog strategies, stimuli are presented to multiple stimulus channels simultaneously. Our results from simultaneous conditions show that channel interactions consistently were stronger in the MP configuration than in the BP or the TP configuration. For that reason, one would expect that speech-processing strategies that use simultaneous stimulation would benefit from use of a BP or a TP configuration, which would result in reduced channel interaction and enhanced multichannel information transmission.

The rationale for use of interleaved pulsatile strategies is that nearby electrodes will never be stimulated simultaneously, thereby avoiding direct summation of current fields. Our results support that rationale by demonstrating that channel interactions, as reflected in threshold shifts, generally were substantially stronger in simultaneous compared with nonsimultaneous conditions. Our results show, however, that threshold shifts were not entirely eliminated by temporal offsets, but instead could persist for offsets of 640 µs or longer. Inasmuch as threshold shifts tended to decline with increasing temporal offsets, variation in the relative timing of pulses on any pair of channels could result in variations in the effective strength of the stimulus on each channel. The threshold shifts measured in nonsimultaneous conditions normally were only a few dB in magnitude, but a few dB is a substantial fraction of the dynamic range of cortical neurons in response to electrical cochlear stimulation.

The inverted-polarity condition that we tested has little practical relevance to any pulsatile strategy that presently is in use, since pulse trains normally would all be in the same polarity. That condition, however, is of some interest in relation to simultaneous analog strategies. In such strategies, the acoustic signal is filtered by a bank of bandpass filters, then the output of each filter is led to a cochlear electrode. Differences in the passbands of adjacent filters would result in differences in phase delays in the signals sent to each electrode, resulting in between-electrode phase differences of as much as 180° for some frequencies. For that reason, the threshold elevations (in the simultaneous MP condition) or threshold reductions that we observed might mirror conditions that occasionally are present in clinical devices.

The present results likely underestimate the magnitude of channel interactions. Most of our analyses examined the effects of a subthreshold pulse on one channel on the responses to a pulse on a second channel. In practical use, speech processors present suprathreshold stimuli to multiple implant channels, so one would expect each channel to have substantial impact on the effectiveness of other channels. Also, we studied only single pulses on one or two channels, whereas any practical stimulus consists of modulated trains of pulses (i.e., in the case of a pulsatile strategy). Temporal integration among successive pulses on each channel presumably would have some influence on temporal details of interactions among channels. Despite these limitations, however, the results provide insights into the influence on channel interactions of temporal and spatial characteristics of pulses and lead to some considerations for speech-processor design.