INTRODUCTION

In a typical cochlear implant (CI), an acoustic signal is divided into several frequency bands and the temporal envelope extracted from each band is used to modulate pulse trains that are delivered to the appropriate implanted electrodes (Loizou 1998). The CI listener’s spectral resolution is limited by the number and location of the implanted electrodes. Given this limited spectral resolution, the temporal envelopes delivered to each stimulating electrode provide useful information for speech recognition, especially for identifying a talker or the talker’s gender. Another aspect of envelope cues is their potential importance in auditory scene analysis (Bregman 1990). Although cochlear implants perform generally well in quiet, their performance is severely degraded in noisy environments, presumably because they do not provide sufficient information to help the listener separate signals from noise. In normal hearing, temporal cues such as onset and offset times or coherent modulations may help listeners to identify discrete auditory objects, for instance, in situations requiring the listener to stream and segregate competing sources of information that do not necessarily elicit explicitly different spectral patterns. Such mechanisms would require complex across-channel temporal processing.

Modulation detection interference (MDI) experiments have provided some insights into these across-channel temporal processing mechanisms. In a typical MDI experimental setup, listeners are asked to detect fluctuations in a signal in the presence of a fluctuating masker; the signal and masker are spectrally separated, so that the masking effect is primarily due to the competing fluctuations in the masker. In normal-hearing listeners, MDI experiments have shown that when the masker and signal contain similar fluctuations, it is significantly harder to detect modulation within the signal than when they have different temporal envelopes, even when the masker and the signal carriers are spectrally remote (Yost and Sheft 1994; Bacon and Konrad 1993). Such findings suggest that the envelopes of the masker and the signal interact at some retrocochlear processing stage. While these experiments are intended to explore across-channel interactions, other experiments that explicitly investigate within-channel interactions have shown similar effects. Within-channel effects have been termed “modulation masking” (MM) to distinguish them from MDI effects. MM and MDI experiments have also shown that a wide range of masker modulation frequencies can perturb the modulation threshold of a signal, with peak MDI occurring at or near the signal modulation frequency (Houtgast 1989; Yost et al. 1989; Bacon and Opie 1994; Ewert and Dau 2000; Ewert et al. 2002).

Several investigators have studied the effects of relative phase between the masker and the signal modulators. Yost and Sheft (1989) found that listeners were able to discriminate modulation phase differences in stimuli separated by more than an octave. However, relative phase did not have a large effect in a follow-up MDI experiment with similar stimuli; in addition, the phase effects were subject-dependent. In a more recent study using a 4-kHz masker and a 1-kHz probe, Yost and Sheft (1994) again found that phase effects in MDI showed considerable intersubject variability. With sinusoidal carriers separated by an octave, Bacon and Konrad (1993) found larger MDI with an out-of-phase modulated masker than with an in-phase masker. When the carriers were closer in frequency, the effect was reversed: subjects showed less MDI with the out-of-phase modulated masker. In an experiment with a two-component masker, Richards et al. (1997) found only small effects of relative phase in MDI. In all these studies, there was considerable intersubject variability in the pattern of results. Taken together, these studies suggest that across-channel phase-of-modulation cues do not play a large role in MDI. This would argue against temporal-coherence-based grouping as a basis for MDI. On the other hand, experiments by Hall and Grose (1991) showed that asynchrony of the masker and the signal reduced MDI (this would support the case for grouping). Moore and Shailer (1992) propose that there are two aspects to MDI: (1) a carrier-frequency-specific auditory grouping that is sensitive to asynchrony and (2) a modulation-specific mechanism that is less sensitive to asynchrony. The notion of a modulation-specific mechanism is supported by recent experiments of Gockel and Carlyon (2000) showing that even temporally nonoverlapping modulated maskers can produce MDI.

Unlike the normal-hearing system, cochlear implants provide a simple way to study temporal and spectral processing by the auditory system independent of each other. This also allows the study of the individual contributions of temporal and spectral factors to the perception of electrical signals. Because of this unique feature, investigators have been able to study purely temporal processing of single-channel stimuli (Chatterjee et al. 1998) as well as the integration of temporal information across tonotopic space (McKay and McDermott 1996). In a multidimensional scaling (MDS) study, McKay et al. (1996) found that for CI listeners, combined concurrent fixed-rate stimulation at two cochlear sites generated a pitch percept that was in between the pitches produced by the individual component electrodes. McKay and Carlyon (1999) found that, for single-channel stimuli, modulated pulse trains generated pitch percepts that could be roughly scaled along two orthogonal dimensions that corresponded to the carrier rate and modulation rate of the stimuli; the salience of the modulation determined the combined pitch heard by the listener. Interestingly, McKay and Carlyon (1999) also found similar results with normal-hearing subjects listening to filtered acoustic click trains. The filtering eliminated spectral cues that might have been produced by resolved harmonics, thus, the subjects were forced to use only temporal cues to perform the task. Further experiments by McKay et al. (2000) showed that pulse rate and place of stimulation cues were perceptually independent for CI listeners. The work of McKay and colleagues suggests that the tonotopic and temporal features of stimuli might be independently processed. However, most of the aforementioned experiments measured the tonotopic and temporal features independently. Pitch perception was measured by using variable temporal cues at fixed cochlear locations or by using fixed-rate pulse trains at variable cochlear locations. It is quite possible that in two-channel stimulation with fluctuating envelopes on each channel, there might be some interaction between the tonotopic and temporal dimensions when the fluctuating envelopes on each channel are within the temporal processing capabilities of CI listeners.

In a MDI experiment, Richardson et al. (1998) found significant envelope interactions when CI listeners were asked to detect modulation in a pulse train in the presence of competing modulation on a second channel. More recently, Chatterjee (2003) observed significantly elevated modulation thresholds when CI subjects were asked to detect a 50-Hz modulated pulse train in the presence of a noise-amplitude-modulated (NAM) masker on a second channel. In that study, modulation detection thresholds were also measured using a steady-state masker (SSpeak) whose amplitude was equal to the peak of the NAM masker’s range of fluctuations. For each tonotopic masker location, the ratio between the modulation thresholds obtained with the NAM and the SSpeak masker was defined as a conservative estimate of “envelope masking” (ρ). The use of the SSpeak masker was important in eliminating from the measure of ρ all sources of energetic masking, or other kinds of interactions due to the peripheral overlap of excitation patterns. The amount of envelope masking varied according to the tonotopic distance between the masker and the signal electrode locations; it also depended on the noise fluctuation depth and was subject-dependent. However, ρ was not always largest when the masker and the signal were spatially similar (masker and signal presented to the same electrode pair); in fact, envelope masking was often minimal in this case. In the second part of the study, envelope masking was measured for two new tasks: detection of a 20-Hz modulation in the signal pulse train and detection of an intensity increment. The results suggested a hierarchy within the effects of envelope masking: ρ was largest for the 50-Hz modulation detection task, smaller for 20-Hz modulation detection, and smallest (sometimes negative) for the intensity-increment detection task. These results are not inconsistent with the “modulation filterbank model” of Dau et al. (1997). Accordingly, the flat-spectrum noise may excite a modulation filter centered at 50 Hz (broader bandwidth in linear modulation frequency) more than a filter centered at 20 Hz, thus producing greater modulation masking of the 50-Hz modulated signal. At any rate, it seems from the results of Chatterjee (2003) that the presence of the masker noise shifts the low-pass-filter cutoff of the modulation transfer function to lower frequencies than in quiet.

In psychoacoustics, the term MDI has been used to describe across-channel envelope interactions, while the term “modulation masking” has been reserved for within-channel envelope interactions. As channels are more difficult to define in cochlear implants, Chatterjee (2003) preferred to use the term “modulation masking” [which was defined as having components due to both peripheral (“tonotopic”) and central (“envelope”) interactions] to describe both within- and across-electrode interactions. Results obtained in that study suggested that within- and across-channel envelope interactions could be separated based on the different effects of the SSpeak and the NAM maskers. To maintain consistency with the psychoacoustics literature, this article refers to the across-channel interactions as MDI and the within-channel interactions as modulation masking (MM). The net effect is referred to as MDI/MM. We continue to use the term “envelope masking” to refer to the difference in modulation detection thresholds between a dynamic-envelope masker and the corresponding SSpeak masker. This measure yields a conservative estimate of the interaction of the masker and the signal envelopes; by eliminating the possibility of energetic masking corrupting our measure, it allows for a more confident estimate of envelope processing at retrocochlear levels of the auditory system.

The previous study (Chatterjee 2003) was not designed to explore modulation tuning. The present study was undertaken to measure modulation tuning, if any, in CI listeners. Further questions studied the effects of relative modulator phase as well as the spread of excitation generated by the masker. As in Chatterjee (2003), the task was to detect a 50-Hz modulation in the signal carrier. Two kinds of maskers were used: dynamic and steady state. The dynamic maskers were modulated by either sinusoidal amplitude modulation (SAM) or uniformly distributed noise (NAM). The SAM frequencies were 20, 50, and 125 Hz. The effect of relative phase (0 and π) between masker and signal envelopes was also measured for the 50-Hz SAM masker. The steady-state maskers were either at the reference amplitude (SS), which was the same for the SAM and NAM maskers, or at the peak of the fluctuation range of the dynamic maskers (SSpeak). As in Chatterjee (2003), the SSpeak masker was expected to produce the maximum “within-channel” masking due to tonotopic overlap between masker and signal. The ratio ρ between the masked modulation-detection thresholds obtained with each of the dynamic maskers and the SSpeak masker provides a conservative index of the contribution of “envelope interaction” to the net masking.

The MDI/MM experiments in the present study provide one measure of channel interaction. To compare results with a more traditional measure, forward-masking patterns were measured in a subset of the subjects, using the same masker and signal electrode pairs. The masker was identical to the SS masker (steady-state pulse train at the reference amplitude). Forward-masked threshold shifts were normalized to the peak masking and compared against similarly normalized spatial patterns of masked modulation detection thresholds across masker electrodes. We interpret the forward-masking patterns as shaped primarily by peripheral elements of channel interaction such as excitation-pattern overlap. In contrast, we expect that spatial patterns of masked modulation detection thresholds reflect a further contribution by more central elements. A comparison of the two may be informative as to the relative contributions of peripheral and central mechanisms to channel interaction in CI subjects listening to dynamic stimuli.

An interesting issue in terms of central processing mechanisms is to what extent they are present in patients with prelingual deafness, i.e., whether they require auditory input in the developing years to be formed. Three of the subjects in the present study had profound hearing loss at an early age. We were therefore also interested to see whether different patterns of results were obtained in these subjects relative to the other, postlingually deafened subjects.

METHODS

MDI/MM experiments

Subjects

Nine adult users of the Nucleus-22 and Nucleus–24 cochlear implant systems participated in this study. Subjects S2, S3, S4, and S6 had several years of experience with previous psychophysical and speech perception experiments in the laboratory. Of these, S3, S4, and S6 had participated in the experiments reported by Chatterjee (2003) as well as in pilot experiments leading up to the present study. Subject S5 had limited previous experience with speech perception experiments. Subjects S1, S7, S8, and S9 had no previous experience with laboratory experiments and were given one session of training in modulation detection and other psychophysical tasks prior to data collection. Table 1 describes details regarding individual subjects, along with their phoneme identification performance when listening to acoustic speech with their everyday speech processors in a sound-treated booth.

Table 1 Potentially relevant information about subjects

Stimuli

Both the masker and the signal were 200-ms-long, 500-Hz trains of biphasic current pulses (200 Μs/phase). As in Chatterjee (2003), masker and signal pulses were interleaved in time with a 1-ms delay between the onset of each masker pulse and the onset of the next signal pulse to achieve concurrent but nonsimultaneous stimulation. A custom-built research interface (Shannon et al. 1990; Robert 2002) was used to deliver stimuli. In the Nucleus device, amplitudes increase from the lowest (~20 ΜA) to the highest level (~1500–1800 ΜA) in discrete steps (“clinical units,” or CUs); there are 238 CUs for the Nucleus–22 device and 256 CUs for the Nucleus–24 device. The exact amplitudes that correspond to each subject’s CUs were obtained from the manufacturer.

For both the Nucleus–22 and Nucleus–24 devices, the implanted electrode array consists of 22 intracochlear electrodes, spaced 0.75 mm apart and numbered 1–22 (base to apex). The masker and the signal electrode pairs were chosen to stimulate focused regions of the cochlea, with the requirement that a measurable dynamic range be obtained without exceeding 1 mA of current. All stimuli were presented using bipolar stimulation, meaning that one electrode served as the “active” electrode and the other as the “return” or ground electrode. In this article, bipolar electrode pairs are described using the form (X, Y), where X and Y are the numbers of the basal and apical members of the pair, respectively. Stimulation modes were BP + 2 or BP + 1 (electrode separation of 2.25 or 1.5 mm), depending on subjects’ thresholds and dynamic ranges.

Masker and signal electrode locations

For subjects S3, S4, S5, and S6, the signal electrode pair (10, 13) was chosen to stimulate the central region of the electrode array. Nine masker electrode pairs were chosen to stimulate regions with various degrees of tonotopic overlap with the signal electric field. The remaining subjects were available for only a smaller number of experimental sessions because of time constraints and, therefore, a fewer number of maskers were tested. In addition, subjects S1 and S2 had problems with facial nerve stimulation on several apical electrodes, so the electrode set chosen for them was shifted toward the base. Table 1 lists the electrodes used for each subject.

Another masker–signal electrode location condition was tested with subjects S3, S4, S5, and S6 in which the two most widely separated electrode pairs [(2, 5) and (18, 21)] were selected. Each electrode pair alternately served as masker and signal, i.e., when (18, 21) was the masker, (2, 5) was the signal, and vice versa.

Masker and signal envelopes

Figure 1 shows schematics of the different masker and signal waveforms. The signal was always a 500-Hz pulse train modulated by a 50-Hz sinusoid in cosine phase. Modulation was applied to the pulse phase duration of the signal rather than to the pulse amplitude. This is because the discrete amplitude steps increase logarithmically in the Nucleus device; as CI listeners are very sensitive to modulation above soft listening levels, in some cases the size of the amplitude step may be larger than the resolution of the subject. In contrast, the device allows for much finer resolution in the pulse phase duration (steps of 0.4 Μs).

Figure 1
figure 1

Schematics of masker and signal stimuli used in the experiments.

The masker was modulated at a fixed depth of 20%. Masker modulation envelopes included sinusoidal amplitude modulated (SAM) at 20, 50, and 125 Hz as well as noise amplitude modulated (NAM); the 50-Hz modulated masker was presented at 0- and π-phase relative to the 50-Hz modulated signal. For all modulated maskers, modulation was applied to the pulse amplitude rather than to the pulse phase duration to ensure that the masker and the signal pulses could be interleaved in time without reducing the interpulse interval excessively. Because the modulation depth for all masker envelopes was fixed at 20% (much deeper than CI listeners’ modulation detection thresholds), the range of fluctuations was not limited by each subject’s minimum amplitude step size (1 CU).

In addition to the modulated maskers, measurements were made using steady-state maskers. Steady-state masker amplitudes were fixed either at the modulation masker reference amplitude (SS) or at the peak of the modulation masker fluctuations (SSpeak), i.e., 120% of the reference amplitude. We note here that the SS masker had less energy than any of the fluctuating maskers, while the SSpeak masker had more energy than any of the fluctuating maskers. The SSpeak masker provides a measure of the maximum masking due to tonotopic overlap of the masker with the signal.

For the additional test conditions using electrode pairs (2, 5) and (18, 21), only the 50-Hz SAM masker (both 0- and π-phase re: signal phase), the NAM masker, and the SSpeak masker were tested.

Noise generation for the NAM masker

The NAM masker was implemented by scaling successive pulses of the 100-pulse masker pulse train by pseudorandom numbers drawn from a uniform distribution ranging from 1 – r to 1 + r, with a mean of 1.0, where r signifies the depth of the fluctuation (0 < r < 1.0). For the 20% depth used in these experiments, r = 0.2, and the range of amplitude fluctuations was between 80% and 120% of the reference amplitude. The envelope spectrum of the noise was flat up to 250 Hz (see Chatterjee 2003 for further details).

Procedures

(a) Dynamic range and current level. Detection thresholds were obtained for each unmodulated carrier as described in the subsection “Procedure for threshold measurements” below. The upper limit of each electrode pair’s dynamic range, termed “Conservative Maximum Acceptable Level” (CMAL), was obtained as follows. The subject was asked to increase the current level of the pulse train (by clicking on a button on the computer display or by pressing the “up arrow” key on the keyboard) until the sound reached the upper limit of the comfortable loudness range. The mean of three repetitions of this CMAL measurement was calculated for each subject. CI subjects are typically very conservative in their judgment using this method. In the previous study (Chatterjee 2003), this method was supplemented with another loudness estimation task, which often results in significantly higher upper limits. Because of time constraints, this supplemental loudness estimation procedure was not used in the present study. The dynamic range for each electrode pair was defined as the difference between the mean CMAL and the threshold in microamperes. The reference amplitude level for all masker and signal electrode pairs was fixed at 50% of their individual dynamic range (in microamperes). This reference level corresponded to a fairly soft but comfortable listening level for each channel. Figure 2 shows the thresholds, CMALs, and the 50% dynamic range reference amplitudes (in microamperes) for each subject’s experimental electrode pairs. Note that because of methodological differences between the previous and the present study, the 50% dynamic range reference amplitude may have stimulated at a lower loudness level in the present study than in the previous study. Three of the subjects, who had also participated in the earlier study, reported lower loudness levels for the stimuli used in the present study.

Figure 2
figure 2

Threshold, CMAL, and the 50% dynamic range point in microamperes (ΜA) for all experimental electrode pairs, for all subjects. The active (basal) member of the pair is indicated along the abcissa.

(b) Threshold measurement. All thresholds were measured using a 3-down/1-up adaptive procedure, theoretically converging at the 79.4% correct point on the psychometric function (Levitt 1971). Each run consisted of a maximum of 10 and a minimum of 8 reversals, and was terminated after 55 trials. The step size was reduced after the first four reversals. The mean and standard deviation of the last 5–7 reversals were calculated to obtain the result for each run (i.e., the first 3 reversals were discarded). In all cases, feedback (correct/incorrect) was provided to the subject on a trial-by-trial basis.

Absolute detection thresholds were obtained for a 200-ms, 500-Hz, 200-Μs/phase pulse train for each electrode pair using a 2-interval forced-choice procedure. Initial and final step sizes were 1 and 0.5 dB, respectively. Each threshold measurement was repeated twice; if the two measures differed by more than 30 ΜA, a third measurement was made. The mean of all measurements was calculated to obtain the detection threshold.

A 3-interval forced-choice (3IFC) procedure was used to obtain modulation detection thresholds. The (linearly) adapted variable was the modulation index (m). Initial and final step sizes depended on the subject’s modulation sensitivity, based on pilot data. Typically, if m at threshold was expected to be less than 0.1 from the pilot data, initial and final step sizes were 0.02 and 0.005, respectively; if m exceeded 0.1, initial and final step sizes were 0.04 and 0.01, respectively. The reference pulse duration was fixed at 200 Μs/phase; the minimum measurable value of m for this reference is 0.002 for the Nucleus–22 device and 0.001 for the Nucleus–24 device. At least 4 runs were conducted for each condition, and the mean and standard deviation of the results for all runs were calculated to obtain the modulation detection threshold. This 3IFC procedure was used to measure both masked and unmasked modulation detection thresholds.

Modulation detection thresholds were also obtained on each masker electrode pair individually. Modulation frequencies at which measurements were made were 20, 50, and 125 Hz. In addition, detection thresholds for noise were also measured. In all cases, the modulation was applied to the pulse phase duration,

Forward masking

Forward-masked thresholds were obtained in subjects S3, S4, S5, and S6 using the following methods. The maskers were identical to the SS maskers used in the MDI/MM experiments. Thus, all forward maskers were 200-ms-long, 500-Hz, 200-Μs/phase pulse trains presented at 50% of each electrode’s dynamic range. The masker-offset to probe-onset delay was fixed at 9.58 ms. The probe was a 20-ms-long, 500-Hz, 200-Μs/phase pulse train presented to electrode pair (10, 13). The forward masker was presented to each of the 9 electrode pairs used in the MDI/MM experiments. Detection thresholds were measured for both the masked and the unmasked probes using a 3IFC, 3-down, 1-up procedure. Initial and final step sizes were 1.0 and 0.5 dB, respectively.

RESULTS

Effects of masker location and envelope type

Figure 3 shows modulation detection thresholds for individual subjects as a function of masker electrode location, with masker envelope as the parameter. In each of the nine panels, masked modulation thresholds (in dB) are plotted against masker electrode number (basal member of each masker electrode pair). The different symbols correspond to the different masker envelopes (SS, SSpeak, 20 Hz, 50 Hz, 125 Hz, 50-Hz/π-phase SAM, and NAM). The downward-pointing arrow marks the signal electrode location for each subject. The range of masked thresholds obtained with the different maskers varies greatly across subjects, being widest for subjects who have low unmasked signal modulation detection thresholds and narrowest for subjects who have high unmasked modulation detection thresholds. In each plot, the horizontal solid line corresponds to the unmasked 50-Hz modulation threshold for the signal electrode pair. Subjects S7 and S8 were not available for the 50-Hz/π-phase masker condition.

Figure 3
figure 3

Mean modulation detection thresholds as a function of masker electrode location. Each panel corresponds to an individual subject. Within each panel, the parameter is masker envelope type. The arrow indicates the location of the signal electrode for each subject. The horizontal line in each panel indicates the unmasked 50-Hz modulation detection threshold for the signal. Error bars show ± 1 SD.

With one exception (subject S2), the SS masker produced the least amount of masking for all masker electrode locations. This is not surprising, as it had the least energy of all the maskers. For all subjects, the SSpeak masker produced more masking than the SS masker, and masked modulation thresholds in the presence of the SSpeak masker were at peak levels when masker and signal were presented to the same electrode pair. In general, the SS and the SSpeak maskers produced similarly low levels of masking at large masker–signal distances. For all subjects, the dynamic-envelope maskers produced masking that was equal to (i.e., not significantly different from) or greater than that produced by the SSpeak masker (with the single exception of subject S2 under the 50-Hz modulated masker condition). These results are generally consistent with those obtained by Chatterjee (2003) with a smaller group of subjects under NAM masking conditions. Recall that the SSpeak masker has more energy than that of any of the modulated maskers.

As in Chatterjee (2003), the amount of “envelope masking” refers to the difference (in dB) between the masked thresholds obtained with each of the fluctuating maskers and the corresponding SSpeak masker. This difference for all subjects is shown in Figure 4 as a function of masker electrode position with masker envelope as the parameter. Statistically significant (Student’s t-test, p < 0.05) amounts of envelope masking are indicated by filled symbols in each case.

Figure 4
figure 4

Envelope masking (masked thresholds – SSpeak masked thresholds) as a function of masker electrode location. Each panel corresponds to an individual subject. Within each panel, the parameter is masker envelope type. The filled symbols indicate significantly elevated modulation thresholds relative to those measured with the SSpeak masker (Student’s t-test, p < 0.05). Arrows indicate the location of the signal.

Significant envelope masking was observed in all subjects, but there was large inter subject variability in the pattern of the results. For all subjects, the 20- and 50-Hz modulated maskers produced the largest amount of envelope masking. Generally, the 125-Hz modulated masker produced the least amount of envelope masking. The 50-Hz/π-phase masker had variable effects across subjects: in five of the seven subjects, it produced the peak amount of masking at most masker electrode locations.

For subjects S1, S2, S4, S5, S6, S7, and S8, the minimum amount of envelope masking generally occurred when the masker and the signal channels overlapped spatially. The maximum amount of envelope masking was most often found off-channel, i.e., when the masker and the signal did not overlap spatially. These trends were consistent with those observed by Chatterjee (2003).

Subject S2’s results with the 50-Hz modulated masker were exceptional and surprising. This subject showed no significant masking effects at several electrode locations. However, all the remaining maskers produced masking effects. In fact, the 50-Hz/π-phase masker was the most effective of all in this subject. We can offer no explanation for this observation at this time.

For each subject, a two-factor analysis of variance (ANOVA) with replication was performed on the masked modulation thresholds obtained with all maskers except for the SS masker; the two factors used in the analysis were envelope type and masker electrode location. In a few cases, the number of repetitions was larger than the standard 4. For the ANOVA with replications analysis, the overall mean of all repetitions was calculated, and the 4 repetitions that had yielded values closest to the overall mean were included. Note, however, that the means and standard deviations shown in all the other plots and used for other analyses were calculated on the basis of all runs, not just the 4 closest to the overall mean.

The main effects of envelope type and masker electrode location were highly significant for all subjects (the lowest level of significance was p = 0.002 × 10–6). The tests also revealed significant interactions between the two factors (significance ranged from p = 0.008 to p = 0.005 × 10–11).

In general, these results show that dynamic maskers are likely to be more effective than steady-state maskers in a modulation detection task, even when the steady-state masker has more energy (as does the SSpeak masker).

Effects of modulator phase

When masker and signal were modulated at the same frequency (50 Hz), some significant effects of masker modulator phase (relative to signal modulator phase) were observed. Figure 5 shows the “phase effect,” calculated to be the difference (in dB) between the π-phase and 0-phase modulated maskers. Statistically significant differences (Student’s t-test, p < 0.05) are represented by the filled symbols. The results show that the modulation phase reversal between the masker and the signal had diverse effects on the subjects’ masked modulation thresholds. Some subjects found it easier to detect the signal modulation in the presence of the π-phase masker, while others found it harder. The effect also depended on the masker–signal electrode distance, reaching a valley or a peak near the signal location. It is possible that subjects used different strategies to perform the task. Some subjects may have “listened in the valleys” of the masker modulation when it was out of phase with the signal, particularly when the masker and signal were spatially proximate. Note that subject S2’s masked modulation thresholds were most affected by the π-phase modulated masker, while the 0-phase masker produced the least amount of envelope masking.

Figure 5
figure 5

Effect of relative phase (masked thresholds with π-phase masker – masked thresholds with 0-phase masker) as a function of masker electrode location. Each panel corresponds to an individual subject. The symbols with inner fill indicate statistically significant effects between the maskers (Student’s t-test, p < 0.05). Arrows indicate the location of the signal.

Envelope masking patterns compared with modulation thresholds

One factor that may be important in defining envelope interactions between channels is the relative perceptual salience of the masking envelope. Here, we define salience loosely as perceptual strength of the modulation. We speculate that the salience of the modulation percept is likely to vary across electrode locations within subjects, as well as across subjects. For individual subjects, the detection threshold for each masker envelope at each masker electrode location may indicate the perceptual salience of that envelope, as well as predict the amount of envelope masking produced by that envelope. As described in the Methods Section, modulation detection thresholds were measured for each masker electrode and envelope type.

A correlation analysis was performed for each subject between the modulation detection thresholds for each envelope at each masker location and the amount of envelope masking produced by each envelope at each masker location. Of the nine subjects, a significant correlation was observed only for subjects S4 (r = 0.64) and S9 (r = 0.60).

Although most subjects’ sensitivity to a particular masker envelope was not well correlated to the amount of masking produced by that envelope, there was a trend in the data that suggested that those subjects who were more sensitive to envelope fluctuations in general were more susceptible to envelope masking effects. To test this hypothesis, a correlation analysis was performed between subjects’ mean modulation threshold (across all electrodes and envelopes) and the mean amount of envelope masking (across all masker electrodes and envelopes). This correlation, shown as the solid line in Figure 6, was not strong but was significant (correlation coefficient r = 0.7424, p < 0.05). As some of the envelope masking results with subject S2 seemed to be atypical, the correlation was remeasured while excluding subject S2 from the analysis. This remeasured correlation (shown by the intermittent line in Fig. 6) remained significant (r = 0.7822, p < 0.05) but not strong.

Figure 6
figure 6

Average envelope masking versus average modulation thresholds for the nine subjects. The averages were taken across all masker electrode locations and envelopes. The regression lines and the correlation coefficients correspond to data sets that either included subject S2 in the analysis (solid lines) or excluded S2 from the analysis (dotted line).

One factor that could confound the analyses above is the rate of loudness growth, or dynamic range, of the individual subjects for individual masker electrodes. A narrow dynamic range would imply that the fixed 20% modulation depth of the masker results in more perceptible modulation than the same modulation depth used with a wide dynamic range (shallower loudness growth). A correlation analysis between the dynamic range on each masker electrode and the envelope masking it produced showed no significant correlations for any of the subjects. Thus, the dynamic range variations across electrodes cannot account for the variations in envelope masking.

These observations suggest two possibilities. It is possible that the salience of a masker’s suprathreshold modulation was not well predicted from the masker’s modulation detection threshold. It is also possible that the effects of envelope masking are due to the relative salience of the masker and the signal modulations, combined with some other factor such as the masker–signal electrode distance. If these two factors interact strongly, the relative contribution of envelope salience may be difficult to find in a correlation analysis.

Average envelope masking pattern

For each subject, a one-way ANOVA was performed on the SAM envelope masking data with envelope type as the factor (pooling masker electrode locations). With the exception of subject S2, those subjects who showed significant effects of masker modulation frequency (p < 0.05) also showed significantly larger envelope masking for the 50-Hz modulated masker than for the 125-Hz modulated masker (Student’ s t-test, p < 0.05). Only subject S4 showed a significant difference between envelope masking with the 20-Hz and the 50-Hz modulated maskers (the 20-Hz masker was the more effective). The average envelope masking produced by each of the 20-, 50-, and 125-Hz modulated and the NAM maskers was calculated across all masker electrode locations. Figure 7 shows this averaged envelope masking pattern (envelope masking versus envelope type) for each subject. The average envelope masking for the NAM masker is shown by the symbol on the right-hand end of each plot. For at least 7 of the 9 subjects, the envelope masking pattern has a low-pass-filter shape within the limited range of modulation frequencies used. Only 5 of these are statistically significant, however. Plots for the subjects who showed significant effects in the statistical tests are in bold solid lines and filled symbols. The remaining subjects did not show significant effects of envelope type on envelope masking. It is possible that the lack of significance in subjects S7 and S8 is partly due to the smaller number of masker electrode locations used.

Figure 7
figure 7

Average envelope masking (across masker electrode locations) as a function of masker envelope for all subjects. The plots with solid lines and filled symbols correspond to subjects whose modulation thresholds were significantly affected by the masker envelope (see text for details).

We note that these results show little indication of modulation tuning, i.e., we do not observe a peak in the envelope masking pattern at the signal modulation frequency. It is possible that we would obtain a more tuned function if we used a lower modulation frequency masker.

Comparison with spatial forward-masking patterns

The forward-masked threshold shift was calculated to be the microampere difference between the masked and the unmasked detection thresholds for the probe. The threshold shifts were then normalized to the maximum threshold shift. To compare the normalized forward-masking patterns to the spatial distribution of masked modulation thresholds, a similar normalization was applied to the masked modulation threshold shift for each of the subjects (i.e., normalized to the peak masking for each masker envelope). Figure 8 shows the normalized spatial patterns for the four subjects. For all subjects, the forward-masking pattern (filled circles) shows a peak at the location of the signal and decays on either side. The width of the pattern varies somewhat across subjects. The remaining plots in each panel show the normalized masked modulation thresholds for the different masker envelope conditions. With the exception of subject S3, the forward-masking patterns are qualitatively consistent in shape with the MDI/MM spatial patterns. In subjects S4, S5, and S6, MDI/MM patterns obtained with the dynamic maskers are more broadly tuned to the signal location than the spatial patterns obtained with the SSpeak maskers. It is apparent from this comparison that the shape and the spatial width of a perceptual “channel” depends on the specific stimuli and on the task.

Figure 8
figure 8

Normalized forward-masked threshold shifts and modulation thresholds as a function of masker electrode location. Each panel corresponds to an individual subject. The filled circles and solid lines show the forward-masking patterns; the open symbols and dotted lines show the spatial extent of modulation threshold shifts for all masker envelopes. Normalization within each plot was performed by dividing each point by the maximum.

Envelope masking for large spatial separations

In this experiment, the two electrode pairs that were farthest apart [(2, 5) and (18, 21)] were selected and each served as signal and masker in turn. Stimuli, levels, and procedures were identical to those used in the previous experiments. The task was to detect a 50-Hz modulation in the signal in the presence of the masker. Only three masker modulators were used: the 50-Hz (0- and π-phase) SAM, and NAM. Subjects S3, S4, S5, and S6 participated in this experiment. Results are shown in Figure 9 in the form of masked modulation detection thresholds for each masker type. Masked thresholds that significantly exceeded thresholds obtained with the SSpeak masker (Student’s t-test, p < 0.05) are indicated with a fill within the symbol. The results indicate that, even when the masker and the signal are as far apart as 16 electrodes (12 mm), significant amounts of interaction can occur, particularly for dynamic maskers. However, the interaction is not reciprocal: even when unmasked modulation detection thresholds are very similar on the two channels (e.g., subject S4), envelope masking may be greater when one of the two electrode pairs is the masker than vice versa.

Figure 9
figure 9

Modulation detection thresholds for widely spaced electrodes as a function of masker type for a 50-Hz SAM signal. The squares indicate modulation thresholds with an apical masker and a basal signal and the circles indicate modulation thresholds with a basal masker and an apical signal. The symbols with inner fill show significantly larger (Student’s t-test, p < 0.05) masked modulation thresholds, relative to thresholds measured with the SSpeak masker. Each panel corresponds to an individual subject.

Auditory experience and training effects

Of the three subjects who had profound hearing loss at an early age, subject S9 uses a hearing aid in his non-implant ear; however, subjects S7 and S8 had no auditory experience at all prior to implantation. Our results showed no obvious differences in either the magnitude or the pattern of envelope masking that could be attributed to extent of preimplantation auditory experience (i.e., to auditory experience prior to implantation). Although this was not the primary focus of this study, these results suggest that auditory experience is not a prerequisite for the mechanisms underlying across-channel temporal interactions of the kind reported here.

Training effects have been reported in the MDI literature (Hall and Grose 1991). Three of the subjects in the present study had participated in similar experiments reported by Chatterjee (2003) and in the pilot experiments preceding the present study. We did not observe reductions in the amount of MDI with experience for any of these subjects. However, it may be important to note that we had not trained any of the subjects in the specific conditions of this study prior to data collection.

DISCUSSION

The present study indicates that there are significant across-electrode envelope interactions in cochlear implant listeners, consistent with the findings of the previous study by Chatterjee (2003). There were significant differences between the two studies in terms of the number of participants, the number of experimental conditions, and the stimulus presentation level (the loudness of the presentation levels in the present study was most likely lower because of methodological differences in estimating each electrode’s dynamic range). In addition, the signal and the maskers stimulated broader regions and overlapped more in Chatterjee (2003) than in the present study. Despite these potential sources of variation, both studies showed that envelope masking was not always greatest when the masker and signal excitation areas overlapped the most; in many instances, envelope masking was larger off-channel than on-channel. The results of the final experiment using widely separated stimuli showed that envelope interactions can occur across very broad regions of the cochlea, reinforcing the notion of centrally mediated, across-channel temporal processing. The envelope interactions in the final experiment were not reciprocal at both electrode locations. In general, we found that modulation sensitivity at individual electrode positions did not predict the amount of envelope masking, as shown by the lack of significant correlation between the modulation detection thresholds on the masker electrodes and the envelope masking pattern. At the very least, the present study shows that the effectiveness of a masker envelope depends on factors other than the envelope’s salience.

The present results suggest that both envelope type and masker electrode location interactively determine the amount of envelope masking. In several subjects, the 20- and 50-Hz modulated maskers produced the most envelope masking and the 125-Hz modulated masker produced the least. However, individual subjects differed considerably in their masking patterns and there was significant variation in the patterns across masker locations within subjects. The most salient masker envelopes did not always produce the largest amount of envelope masking. For 7 out of 9 subjects, there was no correlation between a particular masker’s modulation detection threshold and the envelope masking it produced. However, a mild correlation (r = 0.74) was found between subjects’ overall modulation sensitivity (across all masker envelope types and locations) and the average amount envelope masking (across all masker envelope types and locations). Thus, the subjects who were generally more sensitive to modulation were most susceptible to envelope masking.

The data presented here do not directly support modulation tuning of the kind reported in the psychoacoustics literature for MM and MDI (Houtgast 1990; Ewert and Dau 2000; Ewert et al. 2002). However, the hypothesized modulation-tuned filters are expected to be very broadly tuned. If they are even more broad in electrical stimulation for some reason, it is possible that the range of modulation frequencies used here was not sufficient to capture the tuning. The overall shape of the envelope masking pattern observed here seems to be low-pass rather than bandpass in nature. Although it was tempting to relate the low-pass-filter shape to the low-pass MTF often observed in CI listeners, we were unable to find consistent correlations between the two in individual subjects.

We found two kinds of effects when the masker and the signal were modulated at 0- and π-phase relative to each other at the same frequency: For some subjects, the π-phase masker made it more difficult to detect the signal than the 0-phase masker, while for others, the π-phase masker made it easier to detect the signal than the 0-phase masker. Depending on the subject, the phase effect had either a peak or a valley when the masker and the signal overlapped each other. However, in all cases, the π-phase masker produced significant envelope masking. The large intersubject variability in the results is similar to that observed in normal-hearing listeners. Thus, although envelope phase cues are important, they are not consistently used by subjects in MDI/MM tasks.

The forward-masking patterns observed in subjects S3, S4, S5, and S6 provide a reference measure of the spatial channel interaction for the stimuli used in the present experiments. In general, the modulated maskers created more broadly distributed patterns of interaction across electrodes. We speculate that forward-masking patterns reflect relatively lower-level (more peripheral) processes and can be considered an indicator of spatial pattern overlap (energetic masking region). The spatial patterns of MDI/MM with the dynamic-envelope maskers are likely to involve higher-level (more central) processes. We note here that the forward masker was identical to the SS masker and thus at a lower presentation level than the SSpeak masker. However, Chatterjee and Shannon (1998) found that masker level did not greatly influence forward-masking patterns.

To keep the SSpeak masker within a comfortable loudness range, the masker modulation depth used in these experiments was fixed at 20%. Chatterjee (2003) found that envelope masking varied considerably with masker fluctuation depth. It is likely that some of the effects observed also depend on masker fluctuation depth. Further experiments are necessary to quantify these dependencies.

The fact that we find evidence for centrally based, across-channel temporal 6E interactions in CI listeners suggests that these mechanisms are preserved in electrical stimulation. Furthermore, the limited observation that the extent of auditory experience does not greatly influence the pattern of results indicates that such mechanisms may be “hard-wired” in the auditory system.

In the everyday listening conditions that involve more than one channel of stimulation, CI listeners are likely to experience some degree of envelope masking similar to that observed in the present study. CI listeners generally have significant difficulty following speech in noisy or fluctuating backgrounds (e.g., Fu et al. 1998; Friesen et al. 2001). The results of the present study support previous work showing that channel interaction may be significantly increased when dynamic stimuli are presented and the task involves detecting a dynamic change. It has recently been shown that modulation sensitivity is a good predictor of phoneme recognition by CI listeners (Fu 2002). The present results may therefore apply directly to speech perception by CI listeners in noisy situations. (We note here that factors such as auditory experience prior to deafness can overwhelm temporal resolution in predicting speech perception performance. Thus, subject S7 in the present study is an example of a listener with excellent modulation sensitivity but poor speech recognition.) In experiments with speech perception in modulated noise, Kwon and Turner (2001) found that normal-hearing listeners suffered considerably more interference when listening to bandlimited speech than full-bandwidth speech. Further, under these conditions of reduced spectral redundancy, Kwon and Turner found that MDI was larger when the masker was off-channel than when it was on-channel, a result qualitatively similar to some of the results of the present study as well as those of Chatterjee (2003). Kwon and Turner (2001) suggest that when the interrupted noise is closer to the signal, the subjects are able to “listen in the valleys” and, therefore, MDI is reduced. In the case of our experiments, however, the lack of consistent effects of relative modulator phase suggests that subjects were not able to “listen in the valleys” consistently. Rather, the more consistently observed minimum in envelope masking observed when the masker and the signal approach each other in electrode location seems to be largely due to the dominance of within-channel interactions and “energetic” masking.

CONCLUSION

This study measured MDI and MM in CI listeners as a function of the spatial distance between masker and signal, as well as the temporal envelope relation between the masker and the signal. For the fixed signal modulation frequency of 50 Hz, the masker modulator frequency/type had significant effects, with lower-frequency modulators being more effective than higher-frequency modulators. For a masker at the same modulation frequency as the signal, the relative phase of modulation had a significant effect on the interaction. In general, we found evidence for significant amounts of across-channel (centrally mediated) envelope interactions in CI listeners, even when the masker and the signal were separated by a distance of 12 mm along the electrode array.