Introduction

Increased understanding of the factors underlying variability in outcomes of cochlear implant (CI) users is important for the design of new speech processing strategies or the customization of speech processors to individuals. Differences in the ability to resolve spectral peaks (formants) of vowels and consonants have been hypothesized to underlie variability in speech understanding (Henry and Turner 2003; Litvak et al. 2007; Won et al. 2007). This hypothesis is testable only if spectral resolution can be measured validly and reliably. In this paper, a new assessment of spatial resolution (related to spectral resolution) is presented that has some theoretical advantages over previous methods.

Spatial resolution (referring to the ability to perceptually resolve stimulation on different CI electrodes as opposed to the ability to resolve frequencies, or spectral resolution) must be distinguished from the spatial specificity of neural activity. The latter term refers to the degree of spread of neural activity across cochlear place, given the same overall neural response, whereas spatial resolution refers to the ability of a cochlear implantee to perceptually resolve the activity arising from different electrodes that are concurrently activated. Although poor spatial specificity will lead to poor spatial resolution, the ability to perceptually resolve activity from different electrodes relies on additional factors such as central processing ability and the degree of “noise” in the auditory pathway.

Measures related to spatial specificity have been obtained from spatial forward-masking functions (masking effect versus masker–probe distance) (Chatterjee and Shannon 1998; Throckmorton and Collins 1999; Hughes and Stille 2008; Nelson et al. 2008). It is unclear which metric of spatial specificity derived from the forward-masking function would best correlate with spatial resolution ability. Metrics of spatial specificity that have been used include the average masking across the function (Throckmorton and Collins 1999) and the absolute or normalized slope of the function (Chatterjee and Shannon 1998; Hughes and Stille 2008; Nelson et al. 2008). However, the units of masking (and whether normalized or not) and the part of the function used to measure the slope will alter the ranking of specificity metrics across subjects. The forward-masking literature reveals no consensus for the most valid way of extracting a specificity measure.

Spectral ripple discrimination (the ability to discriminate two sounds with high density spectral ripples in which the spectral peaks and valleys are reversed) has been used as a measure of spectral resolution in cochlear implant users and was shown to correlate with their speech perception performance (Henry et al. 2005; Won et al. 2007). However, the discrimination of these stimuli, even when presented to normally hearing listeners, can be influenced by factors additional to the ability to resolve the ripples in the stimuli, such as differences in loudness, spectral centroid, and changes to the spectral edges (Supin et al. 1998). These issues may be exaggerated when sounds are presented through the speech processor, since in this case a small spectral shift in the acoustic signal may be translated into an easily detectable shift in electrode position at the output of the processor, especially if a subset of electrodes is selected in each cycle as in the advanced combination encoder (ACE) strategy. Thus, the spectral ripple test is likely to be influenced by factors other than spectral resolution, and it is unclear which underlying psychophysical ability drives the correlations seen with speech perception.

Saoji et al. (2009) measured spectral modulation transfer functions (SMTFs). Regression analysis showed that, after detection of very low density ripples (in which there is only a global shift in spectral shape) was taken into account, the ability to detect greater density ripples did not contribute significantly to the correlation with speech perception. However, detection of low density ripples is only related to the ability to detect changes in global spectral shape and not to spectral resolution. Spectral resolution is theoretically related to the shape of the SMFT function (how sensitivity to ripple amplitude changes with ripple density), but the function shape can be affected by similar contaminating factors to those in the spectral ripple discrimination test. The correlation found by Saoji et al. suggested that implantees rely more on global spectral information to identify speech, at least in quiet, than the fine spectral information, perhaps because they have limited access to the latter information, or because it is unreliable, leading to CI users giving it low perceptual weight. This hypothesis might also explain the correlation found between spectral ripple discrimination and speech.

The psychophysical method proposed in this study measured the ability of cochlear implantees to discriminate differences in stimulus pattern across electrodes (spatial resolution) while controlling the contaminating effects of spectral shift or overall loudness cues. The discrimination task required a single spatial ripple in a multiple electrode stimulus to be resolved from the stimulus background. The effect of factors other than spread of peripheral neural activity (spatial specificity) on resolution ability, such as the ability to detect spectral contrasts, was considered and the correlation between the measure of spatial resolution and speech perception performance was evaluated.

Methods

Subjects and equipment

Eight post-linguistically deafened CI users took part. They were all users of the Nucleus Freedom implant manufactured by Cochlear Ltd. The details of each participant are listed in Table 1. Electrical stimuli for the psychophysical experiment were generated using the ImpResS program and SPEAR research processor (see Acknowledgments). The ImpResS program sent stimulus instructions to a SPEAR research processor which in turn sent coded instructions to the implant. Clinical current level (CL) units ranging between 0 and 255 were used to control current amplitude. In Freedom implants, the relation between current in microamps and CL units is I(μA) = 17.5*100CL/255, thus one CL increase is equivalent to 0.157 dB increase in current. The psychophysical procedures were performed via the ImpResS software and the behavioral responses were collected using a custom built response box. Stimuli for speech tests were presented acoustically through participants’ clinical speech processors in free field. All processors used the ACE stimulation strategy and at least 20 active electrodes. An adaptive procedure for obtaining speech reception thresholds in background noise was implemented in a custom MATLAB script based on the PsyLab psychophysical toolbox (Hansen 2006).

TABLE 1 Subject details

General principles of the spatial peak detection test

Rationale of spatial resolution experiment

In the spatial peak detection (SPD) method, the task of the subject was to discriminate a spectrally non-flat (SNF) stimulus from a spectrally flat (SF) one. Figure 1 shows a schematic diagram of SF and SNF stimuli. The SF stimulus was generated by interleaving loudness-balanced pulse trains on 11 adjacent electrodes. Four SNF stimuli were then generated from the SF stimulus by reducing the current on two electrodes symmetrically located on either side of the middle electrode to a level close to zero (one clinical current level, equivalent to approximately 17 μA) while increasing the current level on the middle electrode so that the SNF stimuli evoked the same overall loudness as the SF stimulus. Thus, the four SNF stimuli (with the valleys located at 1, 2, 3, or 4 electrodes distant from the peak electrode) all contained a single spectral ripple (one peak with flanking valleys) but varied in the width of the ripple. It was assumed that the amplitude of the ripple (in neural response terms) was greater for wider ripples due to less smearing of peak and valley information and similarly greater for subjects who had more place-specific neural activation. Since the overall spectral centroid, the spectral edges, and the overall loudness did not vary between SNF and SF stimuli, their discrimination was a measure of how well the spectral ripple was detected without the influence of these other cues. The psychometric function constructed from the discrimination scores of different SNF stimuli from the SF stimulus was used to calculate a measure of spatial resolution for each participant.

FIG. 1.
figure 1

The electrodes activated in the spectrally flat (SF) and spectrally non-flat (SNF) stimuli. The sizes of the arrows represent the loudness contributions of each active electrode.

The hypothetical peripheral neural responses evoked by SNF1 and SNF3 stimuli in a cochlear implant user are illustrated in Figure 2 (panel A). Because of less neural interaction between the stimulation on the more distant peak and valley electrodes, the neural response amplitude of the ripple is larger for SNF3 than for SNF1, and thus SNF3 is easier than SNF1 to discriminate from the SF stimulus.

FIG. 2.
figure 2

Schematic of hypothetical neural response patterns evoked by A SNF1 and SNF3 stimuli in the same implantee, B SNF1 stimulus for two subjects P and Q where Q has more spread of neural response across cochlea, and C SNF1 (rippled curves) and SF (flat lines) stimuli for two subject R and S who have similar spatial specificity of neural response but S has a higher neural response variance over time (shown by hashed area).

Figure 2 (panel B) illustrates the hypothetical peripheral neural response evoked by the same SNF1 stimulus for two subjects P (blue curve) and Q (red curve) who have the same relationship of current to neural response and loudness, but where Q has the poorer spatial specificity of the neural activity. It can be seen that the theoretical amplitude of the ripple in the neural response pattern and thus the perceptual amplitude of the spectral ripple are larger for subject P, who would therefore better discriminate the rippled stimulus from the flat stimulus than would subject Q.

However, the slope of the current-to-neural response function (and hence current-to-loudness function) will differ between individuals, leading to similar differences in relative neural response ripple amplitude to those induced by differences in specificity when the same stimulus (that is, the same current increment on the peak electrode) is used. To reduce the influence of individual differences in current-to-loudness slopes on performance in the SPD task, the current in each SNF stimulus on the peak electrode only was individually adjusted so that the SNF stimulus was of equal loudness to the SF stimulus (see “Stimuli for SPD test” below for details). Thus, the loudness contribution of the peak electrode was always theoretically approximately three times the loudness contribution of the individual flanking electrodes (i.e., the same loudness contribution as that of the peak and two valley electrodes in the SF stimulus). This approximation is derived from the loudness model of McKay et al. (2003) in which it was shown that individual pulses in a stimulus time window contribute approximately independent specific loudness amounts to the total loudness. The individual adjustment of peak electrode current levels in this way limited any influence of differences in current-to-loudness slopes on the perceptual amplitude of the peaks of the SNF stimuli and hence on performance in the SPD task.

Rationale of intensity difference limens experiment

It can be argued that the ability to perceptually resolve the ripple peaks and valleys in the SNF stimuli depends on both the perceptual amplitude of the ripple (related to spatial specificity of the peripheral neural response as explained above) and the ability to detect differences in the pattern of neural responses across the cochlea. Variability in the neural response over time will limit the ability of the subject to distinguish between changes in the neural response pattern that are due to different stimuli from changes due to the underlying variance itself. Thus, performance on the SPD task is hypothesized to be determined both by the place specificity of the neural activity and its underlying variability. Figure 2 (panel C) illustrates the hypothetical neural response evoked by SNF1 and SF stimuli in two subjects R (red curve and straight line, respectively) and S (blue hashed regions) who have identical average neural response ripple amplitude. However, subject S has a more variable neural response than subject R (i.e., each time the stimulus is played the neural response differs, or the neural response varies across the duration of the stimulus). In this case, subject S will find it more difficult to discriminate SNF from SF stimuli than subject R, in spite of having equal spatial specificity. More precisely, the ability to discriminate the rippled stimulus from the flat stimulus depends on the perceptual variance over time that is contributed to by neural variance at all stages of auditory processing from peripheral to more central.

Similarly, it can be argued that the ability to perceptually resolve the spectral details of an acoustic stimulus, such as speech that is processed by the speech processor, will depend upon the same two factors (spatial specificity and neural response variance). Again, the variations in current that correspond to the variations in spectral shape of the acoustic stimulus will be adjusted (at least approximately) in different subjects to achieve the same variations in specific loudness through the application of their individually fitted map in the processor. Those individuals who have a greater current-to-loudness slope will have a smaller electrical dynamic range, and acoustic variations in intensity will be translated to smaller variations in current level, but similar variations in specific loudness, than those individuals with a lesser current-to-loudness slope. Thus, the individual differences in current-to-loudness slopes should not contribute very significantly to the differences in the ability to detect changes in acoustic spectral shape. However, the variability of the neural response will affect the ability to detect changes in spectral shape, even among subjects for whom spatial specificity of the neural activity is equal.

In summary, the ability to perceptually resolve peaks and valleys in the SPD task should be analogous to, and thus predict, the ability to perceptually resolve acoustic spectral components in a speech signal, as the resolution ability depends both on the spatial specificity of the neural response and the neural response variability over time. The differing contributions of these two factors to resolution ability are irrelevant to the question of whether differences in resolution ability are a significant factor underlying differences in performance with implant. However, the two factors have different implications for how spatial resolution might be improved. If, for example, underlying variation in current spread among implantees (leading to differences in spatial specificity of the neural response) account for most of the variability in resolution, then efforts to make stimulation more place-specific are likely to lead to better resolution ability. However, if the variability in neural response is the main factor underlying differences in resolution, then efforts to focus stimulation may or may not lead to a general improvement in resolution ability, but may not help to close the gap between users with good and poor spatial resolution. Indeed, in subjects with large degrees of neural response variance, it may not be possible to improve spatial resolution above a certain limit, no matter how focused the neural response.

In this study, intensity difference limens (IDLs) were measured using the peak electrodes at their reference current levels, and their relation with the performance in SPD task was evaluated. The rationale for this step was to attempt to evaluate the potential contribution of neural response variance to resolution ability as measured by the SPD test. According to the signal detection theory (Swets et al. 1961), IDLs are expected to depend both on the current-to-loudness slope and the variance in the neural response. The fact that IDLs are generally not highly correlated with dynamic ranges, for example r = 0.5 found by Nelson et al. (1996), and hence are not highly correlated with current-to-loudness slopes supports the notion that variance in neural response varies significantly among CI users. While the SPD test is expected not to be related to the current-to-loudness slopes (as explained above), a correlation between performance on the SPD task and IDLs, regardless of the strength of the correlation between IDLs and current-to-loudness slopes, would suggest that the variance in neural response over time is a common factor underlying these two measures and, therefore, an important factor influencing variations in performance on spatial resolution tasks using a discrimination paradigm.

Stimuli for SPD test

The SF and SNF stimuli were created by interleaving pulses in monopolar mode on 11 electrodes (E9 to E19) for a duration of 500 ms. Each electrode was activated by biphasic pulses (25 μs phase duration, 8 μs interphase gap) at the rate of 900 pulses/s per electrode leading to an overall stimulation rate of 9,900 pulses/s. The pulses were interleaved across electrodes in the order from E9 to E19 and were evenly spaced in time. Stimuli with different spectral profiles were obtained by manipulating the current levels of the activated electrodes.

The range of electrode positions was chosen to be that assigned to the frequency region in which fine spectral resolution would be postulated to be the most important for speech understanding (i.e., the region containing the vowel formants). The frequency assigned to the center electrode (E14, around 1.1 kHz) falls in the middle of the vowel formant frequency range, and the valley electrode positions span frequencies from 600 Hz to 2 kHz which span most of the range of the first and second formants.

Spectrally flat stimulus

The SF stimulus was generated by using current levels that elicited equal loudness sensations on electrodes E9 to E19 individually (Fig. 1). The stimuli used for adjusting the loudness of individual electrodes were biphasic pulse trains presented at the rate of 9,900 Hz for 500 ms. The high stimulation rate of 9,900 Hz was used for each single electrode stimulus, rather than 900 Hz, so that the overall loudness changed very little when all the 11 electrodes were sequentially interleaved with the same levels at the rate of 900 pulses/s per electrode (McKay et al. 2001, 2003). This avoided the necessity (as confirmed in this study) of further adjusting currents in the 11 electrode stimulus to achieve a comfortable loudness, which would potentially unbalance the relative loudness contributions across electrodes.

Stimuli on the individual electrodes were loudness-balanced with that on E14, which was initially set to 80% of the dynamic range (DR), or to the level closest to 80% DR that was considered comfortable by the subjects. DR was defined as the difference in clinical current steps between the threshold and the maximum tolerable level. The maximum tolerable level was obtained by gradually increasing the stimulus level until the subject indicated that the perceived loudness would be intolerable if the level was increased further. To obtain the thresholds on E14, an adaptive three-interval three-alternative (3I3A) forced choice task was used presenting three intervals separated by 500 ms in which two were silence and one (in a random position) contained the stimulus. The participants were asked to press the button corresponding to the interval that contained the stimulus. A two-down one-up adaptive procedure was used until ten reversals were reached of which the last six were averaged to obtain the threshold. The step size was set to four CL for the first two turns and two CL for the further turns.

After setting the reference level on E14, the stimulation levels on the other electrodes that produced equal loudness to that on E14 were obtained using an adaptive two-interval two-alternative (2I2A) forced choice task. The participants’ task was to press the button corresponding to the louder interval. The two intervals, separated by 500 ms, contained the stimulus on the reference electrode (E14) and the test electrode in random order. The level of the stimulus on E14 was fixed at the reference level and the level of the test electrode was varied in a one-up one-down adaptive procedure with step size set to four CL initially and reduced to two CL after two turns. The procedure continued until ten reversals were reached from which the average of the last six was recorded. The average balanced level obtained from either one or two runs of the adaptive procedure was set as the current level for the test electrode. The starting point of the test stimulus was always higher than the reference if the procedure was performed only once and started from a quieter level in the second run if the procedure was repeated.

For four of the participants, the whole experiment was repeated at a lower loudness to evaluate the effect of current level on the measure of spatial resolution. The reference level for E14 was set this time to 50% DR, and the stimuli on other electrodes were loudness-balanced again with the reference stimulus on E14 to obtain current levels for the SF stimulus.

Spectrally non-flat stimuli

The SNF stimuli were generated by changing the current levels on three electrodes in the SF stimulus to create a spectral peak and flanking valleys (Fig. 1). In each SNF stimulus, the levels of the two electrodes symmetrically located on either side of the middle electrode of the SF stimulus (E14) were set to one CL to create valleys. The electrodes for the valley electrodes were (E13, E15), (E12, E16), (E11, E17), and (E10, E18) in the four SNF stimuli called SNF1, SNF2, SNF3, and SNF4, respectively. The spectral peak was then generated by increasing the current level on E14 only (while keeping all other current levels constant) until the SNF stimulus was loudness-balanced with the SF stimulus. To find the level of E14 (peak electrode), a 2I2A forced choice task was used that presented the SF and the test SNF stimulus in random order. The participants were asked to press a button corresponding to the louder stimulus. The level of E14 in the test SNF stimulus varied in a one-up one-down adaptive procedure with the step size set as four CL initially decreased to two CL after two turns. The adaptive procedure continued until ten reversals were reached. The average of two runs of the adaptive procedure was used to set the final level on E14. One run started with a level at which the SNF stimulus was louder than the SF stimulus and the other run started with a level at which the test stimulus was perceived quieter. The average of the last six reversals of the two runs was set as the peak level for the test SNF stimulus. For those participants who were also tested at the lower stimulation level, the SNF stimuli were obtained as explained above from the SF stimulus generated using the levels at 50% DR.

Experimental procedures

Spatial peak detection method

The discrimination of SF from SNF stimuli was measured using a four-interval four-alternative forced choice task. The four intervals were separated by 500 ms: three contained the reference SF stimulus and the other (in random position) contained one of the SNF stimuli. The participants were asked to choose the interval with the different (SNF) stimulus by pressing the corresponding button on the response box. Although the stimuli were all loudness-balanced, to overcome any residual loudness differences among the stimuli, a small random level jitter (±2 CL) was applied to each interval. Each random jitter value was applied to the currents on all of the 11 electrodes of the stimulus in that interval. To validate the amount of level jitter, the SF stimulus was loudness-balanced with each of the SNF stimuli by varying an offset level added to the current levels on all its electrodes in an adaptive procedure. The obtained offset value was always in the range ±2 CL confirming that the jitter was sufficient. The participants were informed that the stimuli in each trial would differ in loudness and were instructed to choose the different (SNF) stimulus on the basis of properties other than loudness. No feedback regarding the correct response was provided. The experiment contained a total of 200 trials that were presented in 2 blocks of 100 trials. The SNF stimuli (SNF1 to SNF4) were presented in pseudo-random order each appearing in 25 trials of each block. The percentage of trials in which each SNF was identified correctly (out of 50 trials) was recorded as the discrimination performance for that SNF stimulus. Before starting the main experiment, around 120 training trials (30 per each SNF stimulus) were presented without feedback to familiarize the participants with the task and the stimuli. A psychometric function of percent correct discrimination versus peak–valley distance was constructed and used to obtain a measure of spatial resolution for each participant.

Intensity difference limens

Seven of the eight subjects took part in this experiment. Intensity DLs were measured for current increments on electrode 14 (peak electrode) in two conditions in which the reference stimulus was (1) a single electrode pulse train on E14 with a rate of 900 Hz using the reference current level used on E14 in SF stimulus and (2) the multiple electrode SF stimulus at the same reference current level. The reference current on E14 was the same in each case, but the reference stimulus in condition (2) was louder than that in the single electrode condition (1). For each condition, the IDL on E14 was found using an adaptive 3I3A forced choice task that presented the reference stimulus in two intervals and in the other interval, in a random position, the same stimulus with a higher current level on E14. The participants’ task was to press the button corresponding to the interval that contained the louder stimulus. Both stimulus duration and inter-stimulus interval were 500 ms. The current level on E14 in the test interval varied in a two-down one-up adaptive procedure until ten reversals were reached. The step size was set to four CL for the first two turns and two CL for the remainder of turns. The last six reversals obtained from two runs of the procedure were averaged to obtain the minimum detectable level increment on E14 (in CL) for each condition.

Speech perception

Scores for perception of consonant-vowel nucleus-consonant (CNC) words were obtained from seven of the eight subjects and recognition of IEEE sentences in quiet and noise was obtained from all the participants. Speech stimuli were presented acoustically at 65 dB SPL in free field through the participants’ own speech processor, which was set to the program that had no noise rejection algorithm. The participants were tested in a sound-attenuating booth and listened to the stimuli presented from a loud speaker positioned at 1 m distance. In the CNC word test, the participants’ task was to repeat a meaningful or nonsense word they had perceived after listening to each of the 250 CNC words presented in quiet. Their responses were recorded by the experimenter and the total percentages of phonemes, consonants, and vowels that were identified correctly were scored offline. The score for sentence recognition in quiet was estimated as the percentage of correctly identified key words in a total of 20 sentences. The participants’ task was to repeat any word they could recognize after listening to each sentence. Speech reception in background noise was measured as the signal-to-noise ratio (SNR) at which the performance reached a target score which was 50% of the score in quiet. The rationale for using a 50% of quiet score as the target rather than a fixed absolute score (such as 50% correct) was to account, at least partially, for differences in ability to understand speech in quiet when determining the effect of noise on understanding. If the differences in scores in quiet are modeled as differences in listener efficiency in a standard speech intelligibility index model (rather than differences in input information), then the point at 50% of the maximum on the psychometric function (rather than 50% correct) should represent the same decrement in input information due to noise across the different subjects. Speech in noise scores was obtained using a set of new IEEE sentences presented in eight-talker babble (four females). The SNR was calculated as the ratio in decibels of the signal and noise root mean square values. The SNR at which the performance reached the target score was found by using an adaptive one-up one-down procedure that varied SNR by increasing or decreasing the level of noise while keeping the signal at the constant level of 65 dB SPL. In each trial, two sentences (one at a time) were presented (total of 10 keywords) and the percentage of correctly identified keywords was used to set the SNR for the next trial (set of two sentences) based on whether it was greater or less than 50% of the score in quiet. The procedure started from a high SNR of 25 dB and changed the SNR initially with the step size of 8 dB which was divided by two after each reversal until a minimum step size of 2 dB was reached. The adaptive procedure continued until a total of 20 reversals was reached. The average of the last ten reversals was the SNR that resulted in 50% of the quiet score.

Results

SPD test

The SF and SNF stimuli were successfully constructed for all the subjects. The maximum current level difference among loudness-balanced electrodes of the SF stimulus varied between 6 CL (in S7) and 25 CL (in S1). The average current levels across subjects on the peak electrode (E14) for the SF and SNF stimuli at 80% DR are shown in Figure 3. A repeated measure ANOVA confirmed that the current levels on E14 in SNF stimuli were significantly higher than the SF stimulus to compensate for the valleys (F(4, 7) = 39.6, p < 0.001) and that average currents on E14 for the different SNF stimuli were not significantly different from each other (F(3, 7) = 2.59, p = 0.08).

FIG. 3.
figure 3

Across-subject averages of current levels on the peak electrode (E14) of the SF and SNF stimuli. Error bars represent standard errors.

The results for discrimination of SF and SNF stimuli are plotted in Figure 4 as separate psychometric functions for the stimuli presented at 80% DR (upper panel) and 50% DR (lower panel). The horizontal axis is the SNF stimulus number, which also refers to the distance in electrode spacing between peak and valleys in the SNF stimulus, and the vertical axis is percent correct discrimination of each SNF stimulus from the SF stimulus. The SNF stimulus number zero refers to a virtual stimulus with valleys and peaks located on the same electrode (E14). This stimulus is identical to the SF stimulus; therefore, its discrimination from the SF stimulus (the unfilled circle) is set at chance level (25%) for the purposes of plotting the psychometric function. For seven out of eight participants, the discrimination performance improved as the distance between valley and peak electrodes increased for stimuli presented at 80% DR (Fig. 4, upper panel). Subject S8 showed a performance close to chance for all SNF stimuli. The data for higher numbered SNF stimuli were not collected in some subjects as the data for lower numbered SNF stimuli were already at ceiling. These presumed ceiling data were set to 100% in the following analyses.

FIG. 4.
figure 4

Psychometric functions constructed from the percent correct discrimination of SNF stimuli from the SF stimulus for the stimuli presented at 80% DR (upper panel) and 50% DR (lower panel). The numbers of SNF stimuli are also the distance between peak and valley electrodes (number of electrode spacings) in the corresponding stimulus. The unfilled circle shows the chance performance (25%) for a virtual SNF0 stimulus that was used for plotting purposes. The blue dashed line is the 50% discrimination performance.

A repeated measure ANOVA revealed a significant difference among the discrimination scores of different SNF stimuli from the SF stimulus presented at 80% DR (F(3, 7) = 18.5, p < 0.001) and Bonferroni post hoc analysis confirmed that the performance improved as the distance between peak and valleys increased from 1 to 3 (p < 0.05). No difference was found between the scores of the first and second half trials pooled across subjects and conditions using a paired t test (t = 0.7, df = 20, p = 0.5) showing no effect of learning during the discrimination task. A high correlation was found between the performance in the two halves (r = 0.94, n = 25, p < 0.0001), indicating that the measures were reliable and repeatable. Note that better detection of the spectral ripple in the higher numbered SNF stimuli cannot be explained by higher physical levels on the peak electrode in these stimuli (Fig. 3).

Poorer discrimination performance was found at the lower level compared to the higher level for the four subjects tested at 50% DR (Fig. 4, lower panel). Only the best performer at the higher level (S1) could achieve a score close to 100% correct at the lower level. The remaining four subjects (S4, S6, S7, and S8), who performed most poorly at the higher level, were expected to perform at chance based on these results and, therefore, were not tested at 50% DR level. In the four subjects tested at the lower level, the average current increment on the peak electrode in the SNF1 stimulus (compared to that in the SF stimulus) was 17 and 23 current levels for the stimuli at 80% DR and 50% DR, respectively. Thus, the peak height (relative to its reference current level) could not explain the performance difference at the two loudness levels.

Spatial resolution measure

Spectral resolution is inversely related to the minimum spectral distance between the components that are resolved or represented independently in the auditory system. The minimum spatial distance between peak and valley electrodes required for detecting the single spectral ripple was used to define spatial resolution in this study. The interpolated peak to valley distance for 50% discrimination performance was defined as threshold distance for each participant. For the stimuli presented at 80% DR, these values were obtained by fitting the psychometric curves with a sigmoid function with the exception of two subjects whose data were linearly fit: S1 obtained close to 100% at one electrode spacing, and S8 did not obtain any scores above 50% correct so their 50% points were obtained by linear extrapolation. The psychometric functions obtained at 50% DR (Fig. 4, lower panel) were fitted by a linear function as they represented only a portion of a hypothetical full sigmoid function. The obtained peak to valley threshold distances were converted into millimeters using the electrode spacing specifications of Freedom implants provided by Cochlear Ltd. The inverse of the minimum resolvable distance (in millimeters) was used as the measure of spatial resolution for each participant (thus bigger values represent better resolution). These values, plotted in Figure 5, satisfied the normality criterion and were used in the statistical analyses.

FIG. 5.
figure 5

The values of spatial resolution for each subject (per millimeter) obtained from stimuli presented at 80% DR (blue bars) and 50% DR (red bars). Subjects S4, S6, S7, and S8 were not tested at the lower (50% DR) level.

The increment of current on the peak electrode of SNF stimuli compared to its current in the SF stimulus (the current increment required to make the stimulation on the peak electrode 3 times louder) is theoretically dependent on the current-to-loudness slope of each subject. Subjects who had a greater slope would be predicted to need a smaller current increment on the peak electrode to match the loudness to the SF stimulus. The values for current level increment on the peak electrode from SF to SNF1 stimulus are shown in Table 2 (column 2). The variability in spatial resolution across subjects was not explained by the size of the current level increment of the peak electrode compared to its level in the SF stimulus (r = −0.41, p = 0.32, n = 8). That is, as theoretically expected, performance in the SPD task was not related to the current-to-loudness slope.

TABLE 2 Current level increment on the peak electrode in the SNF1 stimulus relative to that in the SF stimulus (column 2), IDLs on E14 for single electrode stimulus (column 3), and multiple electrode stimulus (column 4)

IDLs and resolution ability

The current level IDLs of E14 are shown in Table 2 (columns 3 and 4). Greater values for IDLs were obtained in the multiple electrode condition than in the single electrode condition (t = 4.3, df = 6, p = 0.005), and a significant correlation was found between the IDLs obtained in the two conditions (r = 0.82, p = 0.02, n = 7). The multiple electrode condition of the IDL experiment was similar to that used by Drennan and Pfingst (2006) and Goupell et al. (2008). They showed that these IDLs were consistent with detection of loudness difference between reference and test stimuli rather than detection of changes in spectral shape. A moderate but nonsignificant correlation was found between IDLs in the single electrode condition and current level increments on the peak electrode (r = 0.72, p = 0.07, n = 7). No correlation was found between IDLs in the multiple electrode condition and current level increments on the peak electrode (r = 0.46, p = 0.3, n = 7). These results suggest that variance in the neural response contributes significantly to the IDLs in addition to the current-to-loudness slopes, as also implied by the results of Nelson et al. (1996), who found that dynamic range accounted for only 25% of the variability in intensity Weber fractions across CI users.

Spatial resolution measured at 80% DR level was correlated with IDLs obtained in both the single electrode condition (r = −0.83, p = 0.02, n = 7) and the multiple electrode condition (r = −0.78, p = 0.04, n = 7). The ability to detect smaller changes in current on the peak electrode was associated with detecting narrower spectral ripples and explained 69% of the variability in SPD task. Thus, following the arguments put forward in the “Methods” section, neural response variance over time could be the common factor underlying the correlation of spatial resolution with IDLs. The high correlation coefficient suggests that the neural response variance in these cochlear implant users, as well as the differences in spread of neural activity, contributed significantly to variability in spatial resolution.

Speech recognition scores

Word and sentence recognition scores in quiet and noise are presented in Table 3. A high correlation was found between sentence in quiet scores obtained in the first and second half trials (r = 0.91, n = 8, p < 0.002), indicating that the sentence recognition measures were reliable and repeatable. Sentence in quiet scores were only correlated with SNRs for 50% of score in quiet (r = −0.93, n = 8, p < 0.001), but all the other speech measures were correlated with each other. No correlation was found between any of the measures of speech perception and spatial resolution (|r| < 0.24, p > 0.7 for all speech measures). Speech perception scores were also not correlated with the IDLs on E14 for either of the single or multiple electrode reference stimuli (|r| < 0.4, p > 0.4, for all speech measures).

TABLE 3 Speech perception scores

Discussion

The SPD test was successfully performed in eight subjects and showed a large range of resolution ability among the subjects. The test was shown to be reliable and was designed to control unwanted cues due to loudness changes and spectral shifts.

Although not the same as profile analysis (Spiegel and Green 1982; Green 1988), the SPD test uses a similar principle in that the subject must differentiate two spectral profiles. There have been two previous attempts to use a test analogous to a typical profile analysis for cochlear implantees (Drennan and Pfingst 2006; Goupell et al. 2008). Drennan and Pfingst (2006) measured the ability of subjects to detect increments or decrements in current on a single electrode within a multiple electrode stimulus. The results were shown to be well predicted by the loudness model of McKay et al. (2003), consistent with subjects detecting an overall change in loudness rather than a change in spectral profile. Goupell et al. (2008) performed a similar experiment with differing amounts of level jitter. However, when using sufficient jitter to control totally the use of within-channel level cues (as in true profile analysis), the task was too difficult for implantees to perform.

The SPD test was designed so that large amounts of level jitter were not necessary to control overall loudness cues, through the use of valley and peak level adjustments that equated the loudness of the SNF and SF stimuli while keeping the flanking electrode levels constant. A small amount of jitter, sufficient to control small errors in loudness balancing was used, but not large enough to render the task impossible as found by Goupel et al. Unwanted spectral cues, such as shifts in spectral centroid or edges that could be contaminating factors in the spectral ripple discrimination and spectral modulation transfer function methods, were also minimized by keeping the levels of the electrodes situated at the edges of the SF and SNF stimuli constant, and using a ripple that was always centered in the stimulus.

Theoretically, it can be argued that the SPD test will allow subjects to monitor the within-channel level at the peak (or valley) electrode positions and thus differs from profile analysis in this respect. However, three aspects of the test design and results support the SPD test as a valid measure of spatial resolution:

  1. 1.

    The discrimination ability of SNF from SF stimuli increased with peak–valley distance, as expected from the influence of spatial specificity (Fig. 2A).

  2. 2.

    The SPD test controlled for subject differences in current-to-loudness slope (that could potentially contaminate the measure) by individually adjusting the peak current to maintain the loudness of SNF stimuli at the same loudness as the SF stimulus. The success of the individual adjustment in achieving this control was consistent with the noncorrelation of SPD performance with the current increment on the peak electrode.

  3. 3.

    The resolution of acoustic frequencies when presented through the speech processor will rely on the same mechanisms as the SPD test: the place specificity of the neural activity and the ability to detect changes of neural response across cochlear place.

Although better detection of the spectral ripple as the spatial distance between peak and valleys was increased is in agreement with the effect of spread of excitation in the cochlea, the IDLs explained most of the variability in the measure of spatial resolution. IDLs depend upon both the current-to-loudness slope (controlled for in the SPD task) and the variance in the neural response. Therefore, it is likely that differences in neural response variance between subjects (common to both SPD and IDL tasks) were driving the correlation between IDLs and SPD performance. A higher variance would make current level changes harder to detect and also make the comparison of neural activity across cochlear place more difficult (Fig. 2C). The implication of this result is that both spatial specificity and neural response variance impact significantly on spatial resolution ability. Thus, focusing current fields to improve specificity may only partially help to improve spatial or frequency resolution ability. Alternatively, the low amount of variability that can be attributed to spatial specificity in SPD task may be due to there being very little variation in spatial specificity across the subjects tested. In that case, improving specificity may improve spatial resolution ability similarly in all the subjects. However, with the results interpreted, it is clear that the poor performers on the SPD task are those who have poor IDLs and thus an inferred greater degree of variance in the neural response.

The measure of spatial resolution was influenced by the presentation level of the stimuli. Although less electrode interaction might be expected at lower current levels, spatial resolution deteriorated greatly at the 50% DR level. This level effect could not be explained by differences in current increment on the peak electrode relative to its background level between the low and high level stimuli. While some other psychophysical measures such as temporal resolution are also greatly affected by presentation level (Galvin and Fu 2005; Chatterjee and Yu 2010), the spread of excitation as measured in forward masking does not change dramatically with level (Chatterjee and Shannon 1998; Nelson et al. 2008). A hypothesis that neural response variance is greater at lower current levels would be consistent with the effect of level on both the SPD and modulation detection tests, as well as IDL tests. It would also explain why the effect of level is not marked in forward masking, since in that case, the stimulus to be detected is presented in a background of silence and would therefore not be so affected by variance within the masker response. The effect of level has not been investigated in the spectral ripple discrimination method.

Spatial resolution measured with the SPD method was not found to be correlated with speech perception, either in quiet or in noise. The participants who could better resolve peak and adjacent valleys in multi-electrode stimuli did not necessarily benefit from this ability for speech understanding. It should be noted that, although our correlation tests lack power (n = 7), the low correlations (|r| < 0.24) indicate that, even if a statistical significance were found with greater numbers of subjects, the amount of variance explained would probably still be very small. Speech perception was previously (Henry et al. 2005; Won et al. 2007) found to be correlated significantly, although only moderately, with results of the spectral ripple discrimination test (|r| < 0.6, n = 29 for all speech measures in both studies). However, as discussed before, the correlation found in those studies is likely to be contributed to by factors other than spectral resolution.

The very low correlation in this study suggests that cochlear implantees do not rely on fine spectral information for speech understanding and that other information, such as global spectral cues, contextual cues, or temporal information, provides the important cues for speech perception. This does not necessarily mean that fine spectral information is not important for speech understanding, rather it may mean that either cochlear implant users cannot access this information, or that the information is so unreliable that it is afforded very little perceptual weight.

The possibility that subjects used other forms of information such as top-down contextual cues in doing speech tests was reduced by including CNC word test that is little influenced by contextual information. Attentional or cognitive factors may influence performance in psychophysical tests more than in speech tests. However, this issue probably does not underlie the poor correlations found, since the results of the sentence in noise test, which presumably required more attention and cognitive effort, were not more correlated to the psychophysical test results than were the results of other speech tests. It could be argued that the measure of resolution using a fixed peak electrode position is not representative of the resolution across electrode array, and therefore, this study might not have been sensitive enough to demonstrate the relation between spectral resolution and speech. However, it should be noted that since valleys span the frequency range where speech formants are located, the SPD method does not measure resolution only at a fixed cochlear place. Another possible reason for not finding a relation between speech understanding and resolution ability might be the variability caused by the participants’ own speech processor (or how it was fitted) in speech tasks. However, while processing strategies influence the representation of spectral and temporal information at the electrode level (Drennan et al. 2010), the same speech processor type and stimulation strategy were used by all the participants in this study. The psychophysical methods that apply direct electrical stimulation are more sensitive to individual physiological differences among implantees and thus are more appropriate than acoustic stimulation via speech processors for investigating individual differences that might be important for developing individually optimized processing strategies. The contaminating cues that are not always obvious when working through a speech processor can also be better controlled when direct electrical stimulation is used.

The inability of spectral resolution to explain variance in speech perception performance has also been found by other researchers. Measuring spectral modulation transfer functions, speech perception performance was found to be correlated with the smallest detectable spectral contrast in spectral ripple stimuli of different ripple densities (Litvak et al. 2007). However, a regression analysis revealed that the correlation was driven by the relation for very low cycles per octave (0.25 and 0.5 cycles/octave) and that there were no further contributions to the correlation for stimuli with higher density ripples (Saoji et al. 2009). Since the lowest density ripples did not provide more than one ripple across the whole electrode array, these authors concluded that the ability to detect fine spectral detail does not necessarily provide better representation of the cues that are used for speech perception with a cochlear implant, or rather, that there are other factors such as the ability to compare global spectral change that have a greater impact upon speech understanding.

The hypothesis that implantees do not have access to reliable fine spectral information and therefore cannot utilize this information to understand speech is consistent with the evidence that implantees cannot take advantage of an increased number of analysis channels in the same way as normal hearing listeners (Dorman et al. 1997; Friesen et al. 2001; Fu and Nogaki 2005). It might be speculated that, if implantees only utilize global spectral information for speech understanding, then this information would be represented well with a limited number of channels and no benefit would be gained by providing additional spectral detail. Thus, the usual interpretation of the experiments that investigate the effect of numbers of channels (that asymptotic scores at low numbers of channels imply poor spectral resolution) may not necessarily be correct. Normally hearing listeners who do improve speech perception with greater number of channels may improve for reasons that are not applicable to or accessible by implantees, such as access to fine temporal information within the spectral bands, or the ability to make use of time- and frequency-limited dips in an interfering noise to detect and follow a speech signal.

Conclusions

The SPD test measured the spatial resolution ability of cochlear implantees while limiting contamination by overall loudness and spectral shift cues. The subjects varied in their ability to distinguish the spatial patterns and their ability was greatly reduced at lower stimulation levels.

The correlation between the SPD resolution measures and the intensity difference limens was consistent with a common factor, such as variability of neural response over time, affecting both measures, and indicated that the differences in spatial resolution were not well explained by differences in current spread alone. The poor correlation with speech perception measures was consistent with a hypothesis that implantees do not have access to, or do not rely heavily on, fine spectral structure when identifying speech in quiet or noise.