Introduction

Auditory steady state responses (ASSRs) are stationary responses to repeated stimuli such as clicks (Galambos et al. 1981; Stapells et al. 1984) that exhibit constant amplitude and phase over time. For a thorough review on the topic, see Picton et al. (2003). The largest responses are obtained with stimulus frequencies around 40 Hz, but responses are also easily detectable in the 20 Hz and in the 80 Hz range (Cohen et al. 1991) and can be measured down to 2 Hz (Geisler 1960). The 40 Hz responses are attenuated by sleep or sedation (Picton et al. 2003), which is not the case with responses in the 80 Hz range. The response itself is thought to be the superposition of middle latency responses (MLRs) and auditory brainstem responses (ABRs) (Galambos et al. 1981; Bohorquez and Ozdamar 2008). More frequency-specific ASSRs can be evoked by tone bursts (Galambos et al. 1981), amplitude or frequency-modulated sine tones and beats (Hall 1979; Picton et al. 2003), or modulated noise (Purcell et al. 2004). For modulated stimuli, recording of responses to multiple simultaneous stimuli is possible (Lins and Picton 1995; John et al. 1998). ASSRs are used to predict hearing thresholds in adults (Rance et al. 1995) as well as in infants (Rance and Rickards 2002; Luts et al. 2006; Alaerts et al. 2009), where correlations with behaviorally determined thresholds exceed 0.95.

Menard et al. (2004) described electrically evoked auditory steady state responses (EASSRs) in cochlear implant (CI) subjects. The phase width of biphasic pulses at different carrier rates was modulated in the 80 Hz range, resulting in stimulus artifacts that included a frequency component at the modulation frequency (Wilson and Ghassemlooy 1993). As the recorded responses showed nonlinear amplitude growth and the stimulus artifact should have grown linearly with stimulus amplitude, the authors concluded that the responses also included a physiological response. However, no other methods were used to distinguish the artifact component from the physiological response.

Jeng et al. (2007, 2008) described electrical stimulation of guinea pigs with sinusoidally amplitude-modulated (SAM) electric stimuli. Artifacts introduced by nonlinear properties of the stimulation and recording setup at the modulation frequency were reduced by stimulating with alternating polarity. The resulting responses exhibited amplitude growth functions and latencies that are similar to those found in acoustically stimulated subjects.

The objectives of the present study were (1) to reliably show that EASSRs to pulse trains can be recorded and interpreted in CI subjects and (2) to assess the correlation between electrophysiological EASSRs thresholds and behavioral T levels. The stimuli were electric pulse trains in the 40 and 80 Hz range that were directly sent to the implant of the test subjects using an experimental test platform. Multiple surface electrodes placed on the scalp of the CI subjects were used to record the evoked responses. Stimulus artifacts were removed in two steps: (1) the responses to pulse trains of the two stimulus polarities were averaged to cancel out polarity-dependent artifacts and (2) the recorded signals were interpolated during the stimulus and power-up pulse artifacts.

To confirm that the recorded EASSRs were of physiological origin, the transient responses to the on- and offset of stimulation were evaluated, and the influence of stimulus parameters such as pulse rate, stimulus intensity and pulse polarity on the variability of the EASSRs was assessed. Latency was compared with data from acoustically evoked ASSRs, and the amplitude growth of the responses was used to estimate electrophysiological thresholds of the EASSRs.

Materials and methods

Subjects

Six subjects took part in the experiments (see Table 1). All subjects were CI patients of the ENT Department of the UZ Leuven University Hospital. They were taking part voluntarily and signed an informed consent form. The experiments were approved by the medical ethics committee.

Table 1 List of subjects tested

Experimental setup

The stimulation and recording setup (Fig. 1) was controlled by a laptop running a software platform developed in our lab. It was responsible for the generation of the electric stimulation signals as well as the recording of the evoked responses.

FIG. 1
figure 1

EASSR stimulation and recording setup. POD programming device. RME RME Hammerfall DSP Multiface II sound card.

After the specification of stimulation parameters such as pulse rate or stimulation electrode in the graphical interface, the electrical stimulation pulse sequences were generated directly by the software platform without the involvement of any acoustic path. The sequences were then sent to a programming device (POD) connected to an L34 research speech processor provided by Cochlear Ltd. The clinical speech processor of the subjects was not used.

Evoked potential recording was done with surface electrodes placed on the head of the subjects according to positions from the international 10–20 system (Malmivuo and Plonsey 1995). The reference electrode was placed on CZ and the ground electrode on the clavicle contralateral to the side of electrical stimulation. Seven active electrodes were located on the left, high, and right forehead (F7, FZ, and F8), the contralateral mastoid relative to the side of electrical stimulation (TP9 or TP10), and the right, middle, and left back of the head (P4, OZ, and P3). A converter box connected the electrodes to a Jaeger–Toennies eight-channel low-noise differential medical preamplifier with a gain of 50,000. The amplifier contained an internal linear bandpass filter with 6 dB per octave attenuation outside the passband, which was set to 2 to 20,000 Hz. The seven amplified signals and the trigger signal coming from the POD were then fed to an external RME Hammerfall DSP Multiface II sound card, which was connected to the measurement laptop (Van Dun et al. 2008).

The skin beneath the electrodes was scrubbed and thoroughly cleaned, resulting in electrode impedances below 5 kΩ, which was verified with a General Devices EIM-107 Prep-Check Plus EEG electrode impedance meter. The radio frequency (RF) transmission cables of the CI system were kept as far away as possible from the recording leads, which in turn were kept closely together. Recordings were made at a sample rate of 96 kHz. Each recording epoch consisted of 32,768 samples (0.34 s).

During the recordings, subjects were lying on a couch or sitting in a comfortable chair and were asked to move as little as possible. To keep them from falling asleep, a subtitled DVD of their choice was shown.

Stimulus construction

The electric stimuli were trains of symmetric biphasic pulses with a phase width of 40 µs, an interphase gap of 8 µs, and desired stimulus intensity specified in current units. The desired pulse rate was rounded to a multiple of the recording epoch rate of 2.93 Hz. The L34 processor was configured to generate a trigger pulse at the beginning of each epoch, which was recorded by the RME soundcard (Fig. 1). Multiple epochs were grouped into stimulation sweeps. Power-up pulses were inserted between the stimulus pulses if the time between pulses was longer than the maximum period possible for the implant, i. e., 13 ms for Cochlear Nucleus implants.

To reduce measurement time, stimuli consisting of two pulse trains with different pulse rates presented simultaneously on two different stimulation electrodes were used. For such interleaved stimuli, a second pulse train was generated and merged with the first one. Because the stimulation of multiple electrodes at a time would lead to increased current spread and is not supported in current Cochlear Nucleus implants, the second pulse train was shifted with a fixed amount of time to prevent overlapping stimulus pulses (see Fig. 2 for an example). Most recordings were made with a fixed time shift of 1,000 µs for the second pulse train, which was reduced to a minimum of 300 µs for some recordings if it was necessary to ensure nonoverlapping pulses for the given pulse rates.

FIG. 2
figure 2

Example construction of a nonoverlapping interleaved pulse train from two single pulse trains of 3 and 4 Hz without power-up pulses. T p minimal distance between two pulses of the two unshifted pulse trains greater than zero, T s time shift for the second pulse train.

T and C levels for all stimulation electrodes used in the experiments were determined behaviorally for each frequency range and stimulation mode. The tested stimulation electrodes were chosen from different parts of the electrode array that was used by the clinical fitting of the subjects and can be found in Table 1. The selection of the stimulation electrodes for the experiments is described in more detail in the “Experiments” section. The comfort level (C level) was chosen as the stimulus intensity between good and loud on a loudness scale of inaudible, very soft, soft, good, loud, very loud, and intolerable. The threshold level (T level) was determined as the stimulus intensity rated between inaudible and very soft.

Stimulus artifact

EEG recordings of responses to electrical stimulation are contaminated with artifacts from stimulation and power-up pulses. Without stimulus artifact removal, the artifacts could cause the erroneous detection of neural responses and could distort response properties such as shape, amplitude, and phase. The shape and behavior of these artifacts is determined by the electrical properties of the components involved:

RF transmission

A transcutaneous inductive link with a 2.5 or 5 MHz RF carrier (Clark 2003) is used to supply Nucleus CIs with power and to transmit stimulus and telemetry data to and from the implant. It encodes information such as stimulation mode, amplitude, phase width, and phase gap. Pulse periods longer than about 13 ms are split by nonstimulating power-up pulses to supply the implant with power.

Capacitive coupling between the RF transmission cable and EEG recording leads is one possible source of recording artifacts. The recording leads are unshielded to reduce the capacitance to ground and therefore, are affected by capacitive coupling with the RF transmission cable. The resulting artifact depends on the mismatch of the recording electrode impedances, is insensitive to stimulus polarity, and can be the reason for a DC offset in the stimulus artifact. To reduce this artifact component, recording electrode leads were kept closely together and as far away from the RF transmission cable as possible and the skin was thoroughly scrubbed to reduce the capacitance and impedance mismatch between the electrodes.

CI

Nucleus CIs use a combination of voltage-controlled and constant-current stimulation (Del Pozo and Delgado 1978; Knaflitz and Merletti 1988; Whittington et al. 2003, 2005). Constant-current stimulation during the stimulus pulses allows for better control of current density and charge delivery than voltage-controlled stimulation as it is not dependent on the impedance of the tissue between the electrodes (Spencer 1981). However, it is not used between stimulus pulses as it would limit the discharge of the electrode-tissue interface, which would cause distortions of the neural response recordings even after the electrical pulse itself. Instead, a low impedance path is used when no stimulation is ongoing, which allows for fast draining of excessive charge buildup.

Biphasic charge-balanced pulses were used to electrically stimulate the nerve fibers, with the second phase undoing any reversible reactions from the first phase (Lilly et al. 1955) and reducing the charge buildup on the electrode-tissue junction. Alternative pulse shapes such as triphasic pulses would have made it possible to further minimize the polarization artifact on the electrodes and reduce the artifact tail duration (Curtis et al. 1991) but were not available in the software used.

Voltage gradients and body capacitance

Voltage differences produced by the current flowing through body tissue are the main source of the electrical artifacts. The fluids, tissues, and bones inside and outside the cochlea can be modeled as a purely resistive volume conductor (McGill et al. 1982). Because of its purely resistive nature, this artifact source is not present if no current is applied, i.e., in the pulse gap or between pulses. Distant placement of the recording electrodes from the stimulation site will reduce this artifact component.

The mostly capacitive impedance between body and ground is charged or discharged for each transient of the stimulus pulses. The resulting equipotential lines are different from the ones produced by the stimulus current and therefore, make it impossible to align recording electrodes on equipotential lines for both of them. This artifact component is not significantly attenuated by distance from the stimulation site and may be reduced by deliberately grounding of the subject. That may be not advisable regarding patient safety, although the seriousness can be debated (Scott et al. 1997).

Amplifier

The used medical preamplifier featured high input impedances and a high common-mode rejection ratio. The employed linear high-pass filters limited low-frequency noise caused by slow potential changes such as amplifier drift and the linear low pass filters limited high-frequency thermal noise (McGill et al. 1982).

Artifact reduction

As neural responses can be evoked by stimulus pulses of both polarities, pulse polarity was alternated between recording sweeps, and sweeps of both polarities were averaged to reduce the influence of the artifacts caused by the stimulus pulses. The remaining artifacts from stimulus pulses and RF transmission were further reduced as described below so that it was possible to eliminate their influence on the obtained results.

Different approaches for artifact removal can be found in the literature, ranging from the use of linear and nonlinear filters (Whittington et al. 2003, 2005; Sennels et al. 1997; Parsa et al. 1998; Gnadt et al. 2003; Liang and Lin 2002) to the subtraction of artifact templates generated by a variety of methods (Hashimoto et al. 2002; Lin et al. 1996; Lin and McCallum 1998; Wagenaar and Potter 2002; Litvak et al. 2003; Yuwaraj and Kunov 1995; Grieve et al. 2000; Blogg and Reid 1990). For multielectrode recordings with a high number of recording electrodes, blind source separation, independent component analysis, and optimized differential reference techniques allow for the spatial localization of artifact sources and potentials (Gilley et al. 2006; Castañeda-Villa and James 2007), which makes it possible to select electrodes that are less affected by artifacts and to remove artifact components from these recordings.

The artifacts in the EASSR recordings here are caused by the stimulus pulses and the RF transmission and are, therefore, synchronous to the expected response. Artifacts vary between subjects and depend on the position and impedance of the recording electrodes. Time-domain shape and duration of the artifact components introduced directly by the stimulus pulses are influenced by stimulation parameters such as pulse width, stimulation mode, stimulus polarity, and current level. Artifact components introduced by the RF transmission for power-up and stimulus pulses vary with pulse width and the amount of energy transferred to the implant. Because of the complicated interactions between the involved parameters, the subtraction of artifact templates obtained from recordings with stimulation at subthreshold level would have resulted in insufficient artifact removal and as artifacts share frequency components with the neural response, simple time- or frequency-domain filtering could not be applied.

Figure 3 shows the first 500 µs of the averaged time-domain signal after a stimulus pulse occurred and the overall amplitude spectrum of the response for a recording of subject S5 on electrodes TP9 and P4 for bipolar stimulation at a pulse rate of 90 Hz and a current level of 230 cu. As can be seen from the time-domain signal, artifacts were strictly time-limited for our experimental setup (see the middle plot in the lower row of Figure 4 for a zoomed version of such an artifact) and contained energy only during the first several hundred microseconds.

FIG. 3
figure 3

Time-domain signal and amplitude spectrum for recordings on electrodes TP9 and P4 for bipolar stimulation of subject S5 at a pulse rate of 90 Hz and a current level of 230 cu. Shown are the time-domain shape and the amplitude spectrum of the averaged EEG after averaging of alternating polarities, the interpolated EEG, and the artifact alone. The spectrum shows only frequency bins corresponding to harmonics of the stimulation pulse rate. The duration of interpolation was 500 µs.

FIG. 4
figure 4

Responses and stimulus artifacts for the on–off recordings at C level of subject S5 on electrode P4 at 41 Hz. The top row gives an overview about the shape of the recorded signal, while the lower plots zoom in to one of the stimulus artifacts. The left column shows the recorded signal for a pulse train of one polarity. The middle column shows the average of the recorded signal for both polarities. The right column shows the signal after signal interpolation. The top right plot contains the lowpass filtered signal (cutoff frequency 100 Hz) in black and the unfiltered signal in gray.

As the artifacts from stimulus and power-up pulses were limited in time and the expected pulse positions were known, artifact removal was implemented by the approximation of the neural responses during the artifact duration with a first-order polynomial. For an artifact between t b and t e, the interpolated signal x i(t) and the artifact amplitude a(t) at time t could then be calculated from the recorded signal x(t) with

$$ \begin{array}{*{20}{c}} {{x_{\rm{i}}}(t) = \frac{{t - {t_{\rm{b}}}}}{{{t_{\rm{e}}} - {t_{\rm{b}}}}}x\left( {{t_{\rm{b}}}} \right) + \frac{{{t_{\rm{e}}} - t}}{{{t_{\rm{e}}} - {t_{\rm{b}}}}}x\left( {{t_{\rm{e}}}} \right)} \hfill \\{a(t) = x(t) - {x_{\rm{i}}}(t)} \hfill \\\end{array} $$

The amplitude spectrum of such time-limited artifacts alone can be seen in Figure 3. Above 3 kHz, no substantial difference between recorded and artifact spectrum is visible, which is consistent with the absence of neural activity above 2 to 3 kHz in the spectrum of background EEG (Boston 1981). Under ideal conditions, artifacts are symmetric with a DC amplitude close to zero (Fig. 3, electrode TP9). For low pulse rates, such artifacts have little frequency content in the frequency bin of the expected response and cause only negligible distortion. Asymmetric artifacts (Fig. 3, electrode P4), however, can contain artifact components of 100 nV or more in the frequency bin of the pulse rate.

The duration of interpolation was 500 µs for power-up and bipolar stimulus pulses and 700 µs for monopolar stimulus pulses. The resulting response shapes were visually checked for any remaining artifact and the interpolation duration altered if necessary, which was the case for less than 3% of the recordings.

Epochs with recording artifacts (e.g., caused by muscle movements) were removed from the analysis. Recording artifact rejection levels were adjusted in a way that about 5% of the recorded epochs were rejected. Rejection was based on the peak-to-peak amplitudes of the raw epochs, which had the artifact peaks caused by the stimulus and power-up pulses removed by the use of signal interpolation as described above.

Response measurements

After the removal of stimulus and recording artifacts, epochs were grouped into recording sweeps according to their pulse polarity and averaged in the time domain. A fast Fourier transform (FFT) was used to calculate the complex frequency spectrum, with a resulting frequency resolution of 0.04 Hz for sweeps consisting of 64 epochs. Response amplitude and phase were obtained from the corresponding FFT bin, and the presence of a neural response was determined with an F test (Lins et al. 1996) relative to the noise at adjacent frequencies. The noise estimate was calculated from 120 adjacent frequency bins, i.e., about 3 Hz below and above the pulse rate of the stimulus. For interleaved stimulation, frequency bins, which contained another stimulus, were left out. A significance level of p < 0.05 was used for all tests.

Apparent latency (group delay) was used to determine the delay introduced by the auditory system (John and Picton 2000). The phase values of the response as obtained from the FFT were corrected by π/2 because the spectrum was based on cosine terms, and the corrected phase values were negated to obtain phase delay. Apparent latency per recording electrode was determined from the regression of phase delay versus stimulus pulse rate by dividing the slope of the regression line by 2π, while only considering electrodes with at least three significant responses (p < 0.05) at different pulse rates. The latency for one recording was calculated by averaging the individual latencies for each recording electrode where the linear regression explained more than 95% of the variance.

Electrophysiological thresholds were derived from EASSRs at different decreasing current levels using two approaches. In the first approach, the significance of the responses was used as the determining factor (threshold bracketing). For all recording electrodes that had responses with a reasonable amplitude growth function, i.e., a predominantly monotonically increasing response amplitude with stimulus intensity, the mean between the last significant and the first insignificant response was used as the determined threshold.

In the second approach, the last two significant responses were linearly extrapolated until the response amplitude would have reached zero, with the resulting current level being the determined threshold. If only one significant response was available or the resulting threshold was lower than the first insignificant response, the threshold was determined to be the current level corresponding to the first insignificant response. The objectively determined threshold for a recording was calculated by averaging the individual thresholds for all recording electrodes where a threshold could be determined as described above.

Experiments

Unless otherwise noted in the experiment description, recording epochs were grouped into sweeps consisting of 64 epochs with a sweep length of 21.8 s. Pulse polarity was alternated between stimulation sweeps. For subject S1, each recording consisted of 16 sweeps, i.e., eight sweeps for each pulse polarity, with a total length of 349 s. For subjects S2 to S6, 12 sweeps with a total length of 262 s were recorded.

Behavioral threshold and comfort levels were determined at 41 Hz as described in the “Stimulus construction” section. In bipolar mode, three stimulation electrodes were selected from the part of the electrode array that was used by the subject’s clinical fitting. The three electrodes were chosen to be close to the apex, the base, and the middle of the usable part of the electrode array. Additionally, the electrode impedances from the clinical fitting were also considered in the electrode selection. Bipolar stimulation started at BP+3 and was increased to BP+4 if the maximum current level was reached before the C level could be determined. Afterwards, two out of the three electrodes were chosen as to maximize the difference in behavioral T levels between them. Additionally, one electrode in the middle of the usable part of the electrode array was fitted in monopolar mode MP1+2. The used stimulation electrodes for bipolar and monopolar mode can be seen in Table 1. Not all measurements could be done for all subjects because of testing time constraints.

Artifact reduction

To evaluate response shape, amplitude growth, and artifact influence, measurements with subject S5 at a pulse rate of 41 Hz with various current levels were made. The current level was set to C−30% behavioral dynamic range (BDR), C−50% BDR, C−70% BDR, and C−90% BDR, with C being the maximum comfortable level for the subject. Each recording consisted of 16 sweeps, i.e., eight sweeps for each pulse polarity, with a total length of 349 s.

On–off responses

The transient responses to the on- and offset of stimulation were measured for subject S5 at a pulse rate of 41 Hz. The current level was set to the maximum comfortable level for the subject. Each recording sweep consisted of two epochs that contained stimulus pulses and 2 epochs that had all stimulus pulses replaced by power-up pulses, resulting in a sweep length of 1.37 s with stimulation occurring for 0.68 s. The recording consisted of 256 sweeps of each pulse polarity with a total length of 699 s.

Response amplitude

Stimulus intensity was varied to analyze response amplitude and artifact growth for subjects S1 to S6. Stimulation in bipolar mode was done on two electrodes, with one of the electrodes stimulated at 35 Hz and the other at 41 Hz, at current levels of C−25% BDR, C−50% BDR, C−75% BDR, at T level and at T−25% BDR. Two measurements were made where each of the stimulation electrodes was stimulated separately. A third measurement used interleaved stimulation on both electrodes at the same time to estimate the effect of simultaneous stimulation, with the same electrodes stimulated at 35 and 41 Hz as in the two measurements with single stimulation. A Wilcoxon signed rank test was used to determine the presence of an effect of interleaved stimulation on response amplitude by comparing the amplitudes of significant responses to single and interleaved stimulation on the same stimulation electrode, recording electrode and pulse rate at C−25% BDR.

For some subjects, monopolar stimulation mode MP1+2 at 35 Hz on one electrode with various stimulus intensities was used to control for the influence of stimulation mode. Stimulation in monopolar mode was done at C−25% BDR, C−50% BDR, C−75% BDR, at T level and at T−25% BDR. A Wilcoxon signed rank test was used to determine the presence of an effect of monopolar stimulation on response amplitude.

Response latency

For subject S5, response latency in two frequency ranges was evaluated for single stimulation. For the 40 Hz range, pulse rates of 35, 38, 41, 44, and 47 Hz were used. For the 80 Hz range, pulse rates of 79, 85, and 91 Hz were used. The current level was set to the maximum comfortable level for the subject. For the recordings in the 80 Hz range and for the recordings at 38, 44, and 47 Hz, recording epochs were grouped into sweeps consisting of 64 epochs with a total length of 21.8 s. Each recording consisted of 16 sweeps, i.e., eight sweeps for each pulse polarity, with a total length of 349 s. Recordings at 35, 41, and 47 Hz were made with the same total length, but with sweeps consisting of eight epochs with the last two epochs having all stimulation pulses replaced by power-up pulses.

For all subjects, response latency in the 40 Hz range for interleaved stimulation was determined by complementing the interleaved measurements at C−25% BDR with three more interleaved measurements. For the first measurement, the stimulation electrodes were swapped, so that the electrode that was stimulated before at 35 Hz was now stimulated at 41 Hz. Two more measurements were done with 38 and 44 Hz. A Wilcoxon signed rank test was carried out to assess the influence of the stimulation electrode on the obtained latencies and a Friedman test for the influence of the recording electrode. Additionally, the presence of an effect of interleaved and monopolar stimulation on phase delay compared to single bipolar stimulation was determined by a Friedman and a Wilcoxon signed rank test, respectively. The interleaved and monopolar phase delays were adjusted by multiples of 2π to be in the range of the phase delays for single bipolar stimulation ±π.

Electrophysiological thresholds

Electrophysiological thresholds were derived from the amplitude growth of the EASSRs for all subjects and conditions where measurements at least four different stimulus intensities were available from the experiments above. This was the case for 26 measurement conditions for the six subjects. Amplitude growth functions were available in six subjects for one interleaved and two single bipolar measurements at 35 and 41 Hz and in two subjects for one single monopolar measurement at 35 Hz. The estimated electrophysiological thresholds were compared between single and interleaved stimulation with a paired t test.

Results

Artifact reduction

Figure 4 shows a section of the on–off recordings at C level for subject S5 on electrode P4 at 41 Hz. The plots at the top give an overview over the shape of the recorded signal, while the lower plots zoom in to one of the stimulus artifacts. The first column presents the recorded responses to stimuli of one polarity. Visible are the stimulus artifacts locked to the stimulus pulse positions with peak amplitudes of about 90 µV and a slow-decay component that lasts about 1 ms. The second column shows the averaged responses from both stimulus polarities. The stimulus artifact is reduced to peak amplitudes of about 20 µV, and the slow-decay component is completely removed from the recorded signal. In the third column, the signal has been linearly interpolated during the artifact. The necessary interpolation duration varied depending on stimulation mode, subject, and recording electrode from below 300 µs to about 1 ms. After interpolation, the electrophysiological response with peak amplitudes of about 500 nV is clearly visible.

Figure 5 shows the relation between the current corresponding to the used stimulus intensity, response amplitude, and artifact amplitude on all electrodes for subject S5 for stimulus intensities of C−30% BDR, C−50% BDR, C−70% BDR, and C−90% BDR. Artifact amplitude is the peak-to-peak amplitude of the remaining stimulus artifact after averaging of alternating polarities but before the signal interpolation, which varies with recording electrode and stimulus intensity. The left plot shows that the artifact amplitude is linear for lower stimulus intensities but saturates for higher ones, indicating that the current source in the implant was also saturated and the implant was out of compliance. In the right plot, response amplitude is plotted relative to artifact amplitude, which is proportional to the effective stimulus current even if the implant is out of compliance. Response amplitude depends on stimulus intensity and recording electrode, is below noise level for lower stimulus intensities and shows a sharp increase with intensities above the electrophysiological threshold.

FIG. 5
figure 5

Relation between stimulus intensity (plotted as the corresponding stimulus current), peak-to-peak stimulus artifact amplitude after averaging of alternating polarities, and response amplitude of the subject S5 for stimulus intensities of C−30% BDR, C−50% BDR, C−70% BDR, and C−90% BDR at 41 Hz. Data points connected only by dashed lines—nonsignificant responses.

On–off responses

Figure 6 shows representative on–off responses for subject S5 on electrode TP9 at 41 Hz. Clearly visible are the late latency responses (LLRs) in response to stimulus onset and offset, with repeated MLRs superimposed. Although the EASSRs seem to start nearly immediately, it takes about 60 ms relative to the onset of stimulation until steady state conditions are reached for this subject. After the response to the last stimulus pulse, one additional response is visible after about 40 ms.

FIG. 6
figure 6

Average of 512 sweeps of on–off responses of subject S5 on electrode TP9 for stimulation at C level with a pulse rate of 41 Hz after artifact removal, lowpass filtered with a cutoff frequency of 100 Hz.

Response amplitude

The mean amplitude of significant responses to bipolar stimuli at C−25% BDR per recording electrode for all subjects can be seen in Figure 7 (n = 257 from six subjects on seven recording electrodes, n = 52 for two single bipolar measurements per subject and n = 205 for four interleaved bipolar measurements per subject, significant responses in about 65% of the measurements). The measured responses on the frontal electrodes were smaller than for responses measured with electrodes on the back of the head and the position at the contralateral mastoid yielded the largest EASSRs. Significant responses could be very reliably obtained from electrodes at the back of the head or the contralateral mastoid (75% to 91% of all measurements with at least one significant response), but only in some cases from the forehead positions (34% to 61%).

FIG. 7
figure 7

Box plot for the amplitudes of significant responses to bipolar stimuli at C−25% BDR per recording electrode for all subjects (n = 257). For subjects with a CI in the right ear, electrodes F7, P4, and TP10 were swapped with electrodes F8, P3, and TP9, respectively. Percentages show the number of significant responses per electrode relative to all recordings with at least one significant response.

Figure 8 compares significant EASSR amplitudes across all electrodes for different stimulation modes (n = 290 from six subjects on seven electrodes, n = 52 for two single bipolar measurements per subject, n = 205 for four interleaved bipolar measurements per subject, and n = 33 for one single monopolar measurement per subject). The median of all significant EASSR amplitudes across all subjects and stimulation electrodes in response to stimuli at a current level corresponding to C−25% BDR was 105 nV (interquartile range IR = 95 nV) for single bipolar stimuli, 105 nV (IR = 113 nV) for interleaved bipolar stimuli, and 149 nV (IR = 258 nV) for single monopolar stimuli.

FIG. 8
figure 8

Box plot comparing the amplitudes of significant responses to stimuli at C−25% BDR for different stimulation modes (n = 290).

When stimulating on the same electrode, there was a significant effect of interleaved stimulation on amplitude (z = −4.34, T = 9, p < 0.001, r = −0.50), with response amplitudes to interleaved bipolar stimuli at C−25% BDR being 17.2 nV (median, IR = 27.0 nV) smaller than responses to single bipolar stimuli. There was no significant effect on response amplitude when single and interleaved bipolar stimulation were compared across different stimulation electrodes (T = 9, p = .22). Monopolar stimulation resulted in a significant median increase of response amplitude of 50.5 nV (IR = 155 nV, z = −2.46, T = 6, p < 0.02, r = −0.35).

Example response shapes for different current levels and an amplitude growth function for the responses of subject S3 on recording electrode P3 for bipolar stimulation at 35 Hz can be seen in Figure 9. Response amplitudes grew nonlinearly and did not reach saturation at the highest current level for 22 out of the 26 measurements (single and interleaved bipolar stimulation on two different electrodes in six subjects, monopolar stimulation in two subjects).

FIG. 9
figure 9

Response shapes and amplitude growth function of subject S3 for single bipolar stimulation at 35 Hz recorded on electrode P3.

Response latency

Apparent latencies calculated from the responses on recording electrodes TP9, P4, and OZ for the latency experiments for single bipolar stimulation with subject S5 can be seen in Figure 10. For the 40 and 80 Hz range, latencies of about 37 and 12 ms, respectively, were obtained. The regression line explains more than 95% of the variation on the recording electrodes (90% for the 40 Hz range on electrode P4). For the 40 Hz range, the influence of stimulus polarity, noise, and recording electrode is shown in Figure 11. Phase delay corresponds to the angle of the response in the polar plot, is proportional to the pulse rate of the stimulation, and similar for different recording electrodes at the same pulse rate.

FIG. 10
figure 10

Apparent latencies on recording electrodes TP9, P4, and OZ for single bipolar stimulation at C level for subject S5 for the 40 and 80 Hz range. For the 40 Hz range, pulse rates were 35, 38, 41, 44, and 47 Hz. For the 80 Hz range, pulse rates were 79, 85, and 91 Hz. Error bars are based on the possible change in phase delay that could be introduced by the average noise amplitude from the neighboring frequency bins and were not used for the regression calculation.

FIG. 11
figure 11

Polar plot of the complex responses to show polarity, noise, and recording electrode influence on phase delay for subject S5 at C level for the pulse rates 35, 38, and 41 Hz on recording electrodes TP9, P4, and OZ. Error bars show the difference between the polarities; circles denote the noise level derived from the neighboring bins for the averaged case.

After artifact removal, most of the response differences between stimulation polarities are smaller in amplitude than the noise level of the neighboring bins, with the differences between recording electrodes being the main source of latency jitter. The response shapes for two of the recording electrodes are shown in Figure 12, which also include electrically evoked auditory brainstem responses (EABRs) in the first 10 ms that vary in amplitude relative to the whole responses across electrodes.

FIG. 12
figure 12

Response shape for subject S5 on electrodes TP9 and P4 for stimulation at C level for the pulse rates 38, 44, 47, 79, 85, and 91 Hz, lowpass filtered with a cutoff frequency of 3,000 Hz. The EABR with peaks eII, eIII, and eV can be seen on both electrodes.

For all subjects, response latencies in the 40 Hz range were calculated for all recordings with interleaved bipolar stimulation at C−25% BDR (Figure 13, n = 44, from six subjects on two stimulation electrodes with about four recording electrodes where response latencies could be calculated). The average response latency of all measurements with regressions explaining more than 95% variation was 35.6 ms (SD = 5.3 ms, n = 26 from six subjects on two stimulation electrodes with about two recording electrodes where response latencies could be calculated and the regression explained more than 95% of the variation). The influence of the position of the stimulation electrode in the cochlea on response latency was not significant across subjects (T = 5, p = .056), as was the influence of the recording electrode (χ 2 = 10.7, df = 6, p = .098). It was not possible to check for the influence of stimulation mode on response latency as measurements were only done on one frequency per stimulation electrode for single bipolar and monopolar stimulation mode. The influence of interleaved bipolar and monopolar stimulation on phase delay compared to single bipolar stimulation was not significant (χ 2 = 1.73, df = 2, p = .42, and T = 10, p = .65, respectively).

FIG. 13
figure 13

Histogram of response latencies in the 40 Hz range for all recordings with interleaved bipolar stimulation at C−25% BDR (n = 44).

Electrophysiological thresholds

Figure 14 compares the electrophysiological thresholds determined by the two methods described in the “Response measurements” section with the behavioral T levels at the same pulse rate (n = 25, six subjects with one interleaved and two single bipolar measurements, two subjects with one single monopolar measurement). Out of the 26 cases where amplitude growth was measured, one case did not allow the derivation of an objective threshold. For the remaining 25 cases, the bracketing method resulted in a mean difference between objective threshold and T level of 6.2 cu (SD =  6.1 cu, r = 0.964), while the extrapolation method resulted in a mean difference of 2.2 cu (SD =  6.4 cu, r = 0.961). The influence of single or interleaved stimulation on the estimated electrophysiological thresholds was not significant for the bracketing nor the extrapolation method, with t (10) = −0.731, p = 0.48 and t (10) = −0.319, p = .76, respectively.

FIG. 14
figure 14

Electrophysiological thresholds versus behavioral T levels for all amplitude growth measurements (n = 25). Gray bars denote the behaviorally determined dynamic range. Overlapping cases have been shifted to the top right by 1 cu.

Discussion

Artifact reduction

For the low-rate pulse trains that were used in this study, the effects of stimulus artifacts on response properties such as amplitude and phase were significant and had to be reduced. The slow-decay component of the stimulus artifacts up to 1 ms was removed by the averaging of responses to pulse trains of alternating polarities, and the remaining initial spike of the artifacts could be eliminated by the interpolation of the time-domain recordings during the electrical stimulus and power-up pulses. The artifacts of monopolar stimuli had higher amplitudes and longer exponential slow-decay components.

Response amplitudes as well as the amplitudes of the remaining artifact after averaging of alternating polarities increased with stimulus intensity. Artifact amplitudes seemed to show linear growth at lower stimulus intensities and nonlinear saturation caused by the CI’s current source at higher intensities, while response amplitudes were below noise level at lower intensities and increased sharply with intensities above the electrophysiological threshold, only in some cases showing saturation. The different amplitude growth of artifact and response amplitudes with stimulus intensity confirms that the detected responses were of neuronal origin and independent of the stimulus artifact.

On–off responses

Responses consisting of LLRs at the onset and the end of stimulation and delayed EASSRs during the stimulation could be recorded. Steady state conditions for the EASSRs were reached about 60 ms after stimulation onset. A possible explanation is provided by the modeling of ASSRs as superimposed transient responses. Acoustically evoked ASSRs in response to clicks in the 40 Hz range can be explained by superposition of several time-shifted ABRs and MLRs to repeated individual stimuli, most importantly waves V, N aP a, and N bP b (Bohorquez and Ozdamar 2008). As some of these components are only available after the third stimulus, steady state conditions could not be reached before 50 ms after the onset of stimulation. The additional response after the end of stimulation would correspond to the N bP b wave, which occurs about 50 ms after the last stimulus pulse.

Response amplitudes

The responses showed nonlinear amplitude growth with increasing current level that did not saturate for most subjects. Saturation did occur for four out of 26 measurements, which might have been caused by the implant being out of compliance. The nonlinear amplitude growth may also be related to loudness growth, which might show similar nonlinear behavior. The response amplitudes for bipolar as well as monopolar stimulation were in the range of what is seen for ASSRs to acoustic noise and click stimuli in the 40 Hz range (Galambos et al. 1981; Stapells et al. 1984). As is the case with acoustically evoked ASSRs (Van Dun et al. 2009), recordings from electrodes positioned at the mastoids or at the back of the head resulted in the largest responses.

Analogous to multiple carrier frequencies for simultaneous stimulation with acoustically evoked ASSRs (Lins and Picton 1995), multiple responses could be evoked simultaneously by two interleaved pulse trains on different stimulation electrodes. Interleaved stimulation resulted in slightly lower response amplitudes than single stimulation on the same electrode, which could be caused by the stimulation of overlapping populations of nerve fibers.

At the same sensation level, monopolar stimulation created stronger responses than bipolar stimulation. Possible reasons include the activation of larger populations of nerve fibers from monopolar stimulation, higher synchronization of the activated nerve fibers to such stimulation pulses as well as the selection of different stimulation electrodes in bipolar and monopolar stimulation mode.

Response latency

The mean response latency of 35.6 ms (SD = 5.3 ms) for a frequency range of 35 to 47 Hz confirms the analogy with acoustically evoked ASSRs in response to clicks. Stapells et al. (1984) reported latencies of ASSRs for tone bursts of 33.3 ms (SD = 8.6 ms) for a frequency range of 35 to 55 Hz and Stapells et al. (1987) reported latencies of 41.1 ms (SD = 5.7 ms) for a frequency range of 29 to 54 Hz. For electrically evoked ASSRs in response to SAM tones with a 1 kHz carrier in guinea pigs, Jeng et al. (2008) reported response latencies of 22.1 ms (SD = 4.8 ms) for a frequency range of 13 to 55 Hz.

The response latency of about 12 ms for the 80 Hz range is in the proximity of latencies reported in the literature for SAM tones, such as 15.8 ms (SD = 0.8 ms) for 80 to 95 Hz at 6 kHz (John and Picton 2000), 13.1 ms for 75 to 95 Hz at 4 kHz (John et al. 2001), and 12 ms for 80 to 100 Hz at 1 kHz (Kuwada et al. 2002). They also correspond to latency measurements of about 16 ms for acoustically evoked ASSRs in response to clicks in the frequency range of 70 to 90 Hz from other research done in our lab.

The small differences in response latency between recording electrodes for the same measurement that can be seen in Figure 10 may be caused by the fact that ASSRs are actually composite responses formed from various components that originate from different neural sources in the brain. For multichannel recordings, the variations in the recorded EEG signal between recording electrodes are caused by the different location and orientation of the neural sources relative to the recording electrodes. Components such as EABR wave V that differ in amplitude relative to the overall response between different recording electrodes (Figure 12) could, therefore, lead to the observed shift in response latency.

In contrast to acoustically evoked ASSRs where response latencies depend on the carrier frequency, EASSRs should have constant latencies independent of the excited place in the cochlea as no traveling wave delays the excitation of the nerve fibers (Picton et al. 2003). Additionally, response latencies for EASSRs should be smaller than for comparable acoustic stimuli because of the direct excitation of the nerve. A comparison between wave V latencies of 4.20 ms for EABRs to basal stimulation (Shallop et al. 1990) and 5.64 ms for ABRs (Nikiforidis et al. 1993) shows a difference of about 1.4 ms, while the necessary delay of the electric pathway for psychoacoustically synchronous bimodal stimulation is about 1.5 ms (Francart et al. 2009).

No consistent influence of the stimulation electrode on the obtained response latency was found, confirming that EASSRs are independent from the place of excitation in the cochlea and that the responses are most certainly caused by generators at the brainstem or cortex level. It is hypothesized that multiple generators may be responsible for the ASSRs in the 40 Hz range (Picton et al. 2003). This might be the reason for the high standard deviations of the latency estimates across subjects compared to the 80 Hz range where only one dominant generator seems to be active. As the sampling of the mixture of multiple generators will vary between the different recording electrodes, the response latency measurements might be distorted by the interaction of the individual latencies of the generators. To avoid this, a regression limit of 95% was used for the calculation of apparent latency to select only electrodes with one predominant generator. At least two electrodes fulfilled this condition for each subject where latency data was available.

There was no significant effect of interleaved or monopolar stimulation on the phase delay, hinting that no significant additional distortion on the response latencies was introduced by these stimulation modes compared to single bipolar stimulation. This should make it possible to repeat these measurements with interleaved monopolar stimuli. Monopolar stimuli are more clinically relevant with respect to the fitting of speech processors of CI children and interleaving reduces the time needed for such measurements.

Electrophysiological thresholds

Both methods for estimating electrophysiological thresholds from amplitude growth functions worked equally well to estimate T levels at the same pulse rate as used for stimulation (r > 0.96). The differences between the behavioral T levels and the objectively determined electrophysiological thresholds for the threshold bracketing could be caused by an inadequate signal-to-noise ratio that prevents the detection of responses with small amplitudes. For the extrapolation method, non-linear response growth with increasing stimulus intensity such as seen for acoustically evoked ASSRs may be one of the reasons for the measured differences (Lins et al. 1995). Additionally, there is still no proof that the behaviorally obtained T levels as described above are optimal regarding speech perception. There are different methods of determining T levels representing the minimum audible stimulus intensity (van Wieringen and Wouters 2001) and speech understanding seems to be insensitive to small T level changes (Smoorenburg et al. 2002). It has been shown in different studies that thresholds using EABRs and compound action potentials (ECAPs) are not optimal predictors for hearing thresholds (Miller et al. 2008), but adults with ECAP derived maps instead of their normal maps do not seem to perform significantly worse (Craddock et al. 2003).

In this study, we have demonstrated the possibility of using electrical pulse trains to evoke ASSRs. As the determined electrophysiological thresholds and the behavioral T levels are highly correlated, EASSRs at pulse rates in the 40 Hz range should allow the objective evaluation of the underlying MLRs and the neural processing of such electrical stimuli. Additionally, it should be possible to extend the obtained results to lower rates that are processed more cortically and are relevant for speech envelope detection. However, the high correlations were shown for lower pulse rates than actually used in clinical speech processors, and the electrophysiological thresholds were compared to behavioral T levels at the same pulse rate as used for stimulation. To improve the relation between electrophysiological thresholds and behaviorally determined T levels at clinically relevant pulse rates, EASSRs need to be obtained for pulse rates in the range of 500 to 1,000 Hz. As the stimulus artifacts are in the order of a few hundreds of microsecond, the presented method for artifact removal may allow to generalize to these higher pulse rates. Future studies that concentrate on the use of alternative stimuli such as high-rate pulse bursts or high-rate pulse trains with low-frequency amplitude or phase-width modulation (Menard et al. 2004), which might create discharge patterns that are more similar to the ones generated by the use of clinical speech processors (John and Picton 2000), might close the gap between this exploratory study and clinical practice.

Conclusions

The experiments done in this study show that it is possible to record and interpret EASSRs in CI users. A recording setup was presented that allows the recording of neurological responses to stationary electrical stimuli. The artifact removal process used allows the evaluation of ASSR properties such as shape, amplitude, and phase.

The properties of electrically evoked ASSRs are very similar to those of acoustically evoked responses, including amplitude growth and latencies in the 40 and 80 Hz range. EASSRs allow the derivation of electrophysiological thresholds that have a very high correlation with behavioral T levels. In the near future, this methodology may allow the development of objective fitting paradigms of CIs with improved correlation to behavioral fitting outcomes.