INTRODUCTION

During the past two decades, a number of computational models of the electrically stimulated auditory nerve (AN) were developed. These include biophysical (Colombo and Parkins 1987; Finley et al. 1990; Frijns et al. 1996; Cartee 2000; Matsuoka et al. 2001; Rattay et al. 2001; Morse and Evans 2003) and phenomenological models (Shannon 1989; Bruce et al. 1999b; Rubinstein et al. 2001; McKay et al. 2003; Xu and Collins 2005; Carlyon et al. 2005). The development of computational models may guide our understanding of the AN properties and help to design future coding strategies for cochlear implants (CIs).

On the one hand, biophysical models are typically conductance-based and aim to describe the kinetics of ion channels using the formalism of Hodgkin and Huxley (1952). Most of these models are derived from cold-temperature animals such as the squid (Hodgkin and Huxley 1952) or the toad (Frankenhauser and Huxley 1964) and need further adjustments to match the properties of the human AN (Rattay et al. 2001). The gating processes need to be accelerated to account for the temperature difference, and conductance values need to be corrected to account for the higher channel density, thereby leading to relatively short time constants. Cartee (2000) studied the summation and refractory properties of the AN using several conductance-based models. She determined thresholds for a single pseudomonophasic pulse and for a pair of pulses separated by an interpulse interval. She showed that the threshold for the pulse pair was lower than for the single pulse for intervals smaller than 500 μs, irrespective of the model and of the temperature value (varying from 20 to 39°C), and that this difference vanished for longer intervals. Similarly, the introduction of an inter-phase gap (IPG) between the two phases of a biphasic (BP) pulse was shown to reduce threshold of excitation in animal and computational models (van den Honert and Mortimer 1979; Shepherd and Javel 1999). However, this threshold drop remained approximately constant when the IPG was increased beyond 100 μs. This is in discrepancy with a recent study of Carlyon et al. (2005) who reported that thresholds of cochlear implantees for BP pulses continued to drop as the IPG increased up to 4.9 ms, the longest value tested. They showed that this drop was not due to a release of refractoriness at levels central to the AN but rather to a specific process at the level of the cochlea/AN. This may involve ion channels with higher time constants, which were not taken into account in previously published biophysical models of the AN. The exploration of individual ion channels in conductance-based models can be complicated because of the strong nonlinearities and of the numerous parameters they use.

On the other hand, phenomenological models are based on experimental results and aim to describe trends of data using general and simple mathematical laws. They have proved, in some cases, to be accurate predictors of psychophysical results. Shannon (1989) proposed a dual-process model that could predict a wide range of data acquired for biphasic and sinusoidal stimuli. However, Carlyon et al. (2005) pointed out that this model could not account for the effects of introducing an IPG between the two phases of a BP pulse. In this same study, Carlyon et al. introduced a linear filter model derived from behavioral thresholds for sinusoidal stimuli, which could account for the effects of IPG. Since then, their model has successfully predicted cochlear implantees’ thresholds for a wide range of stimulus waveforms (Macherey et al. 2006; van Wieringen et al. 2006). Despite using a few parameters and predicting a large set of data, the physiological significance of this model is not straightforward, and its predictions are restricted to the threshold level and cannot account for loudness growth.

The aim of the present study is to develop a stochastic, phenomenological model of single-channel stimulation that can account for thresholds and loudness data of CI users subjected to a variety of stimulus waveforms. As the model of Carlyon et al. (2005) suggests a linear process underlying threshold levels in the AN, we believe that the linearization of conductance-based models’ equations can provide a useful tool to study the dynamic of the AN membrane. It may also, in return, provide a basis to develop more realistic conductance-based models of the human AN. Linearization of conductance-based models is valid in the subthreshold domain, where the variations of the transmembrane potential remain small (Mauro et al. 1970). Different subthreshold behaviors can be obtained depending on the characteristics of the neuron. Spiking neurons can be divided in two major classes based on their mechanisms of excitability as they go from quiescence to periodical firing. Type I neurons act as integrators of incoming signals; type II neurons act as resonators and show a peak response at a specific frequency (Izhikevich 2001; St Hilaire and Longtin 2004). This classification is based on the dynamics properties of a neuron and should not be confused with the anatomical classification of primary auditory neurons. These two types of neurons can also exhibit different behaviors in response to a subthreshold step-current: the transmembrane potential of type I neuron exponentially converges to the holding voltage whereas type II neurons show, in some cases, damped oscillations (Richardson et al. 2003). Based on the results from previous psychophysical experiments with CI users, we hypothesize in this study that primary auditory neurons can exhibit these two types of behavior. First, nonmonotonic curves of threshold vs rate were observed in some cochlear implantees subjected to BP pulses having relatively long-phase durations (1 or 2 ms) (Shannon 1985; Pfingst et al. 1996) and to sinusoidal stimulation (Shannon 1985; Pfingst 1988; Miller et al. 1999a). In these studies, a threshold minimum was typically reached between 70 and 100 Hz. This frequency preference resembles the fundamental property of resonator-like neurons (type II). Frequency resonance was observed in a number of real and modeled neurons (reviewed in Hutcheon and Yarom 2000) and we will show that nonmonotonic functions obtained in CI listeners are consistent with the operation of such a mechanism. Second, at stimulation frequencies higher than 300 Hz for sinusoidal stimulation and also for biphasic stimulation with relatively short-phase durations (<500 μs), the neural membrane is believed to act as a leaky integrator of charge, similar to type I neurons (Moon et al. 1993).

Simple linear models can exhibit either integrative or resonant properties. The “leaky integrate-and-fire” model (Gerstner and Kistler 2002) is the most common and simplest integrator model. Izhikevich (2001) proposed an analog of it, which shows a resonance and which he termed the “resonate-and-fire” model. Similarly, Richardson et al. (2003) introduced the “generalized integrate-and-fire” model, which could also, in some cases, exhibit frequency resonance and subthreshold damped oscillations. These models can be obtained via linearization and simplification of conductance-based models’ equations. They aim to describe the subthreshold deviations of the transmembrane potential from the resting potential and assume that an action potential is initiated whenever the membrane voltage crosses a certain threshold. The “Methods” section of this paper presents a dual-process model, which uses both a linear integrator and linear resonator neurons. Although the characterization of the processes involves a large number of parameters, in the “Results” section, the values of these parameters are kept fixed while we compare the model predictions to the results of psychophysical experiments employing a wide range of pulse shapes.

METHODS

Description of the model

The present model includes four stages (Fig. 1): stimulation, subthreshold behavior, neural activation, and central integration.

FIG. 1
figure 1

Schematic representation of the stochastic model.

Stimulation

We assume that a population of N f neural fibers close to the stimulating electrode is being driven by a current I Stim. The spatial spread of current is not modeled and we simply consider a uniform stimulation. No hypothesis concerning the mode of stimulation (monopolar or bipolar) is being made. A sampling frequency of 250 kHz is used for computation except for pulsatile stimuli with short-phase durations (<30 μs) where 500 kHz is used. The stimulus duration is always 100 ms unless otherwise stated.

Subthreshold behavior

We assume that the population of stimulated neurons is divided into two classes. Some are integrators and some are resonators. Although we assume in this study that two populations of neurons are modeled, the model can also be seen as describing the same population of neurons stimulated at distinct spatial locations or submitted to different patterns of excitation. The proportion of integrator neurons (λ) is assumed to be 0.5, the same as the proportion of resonator neurons (1−λ). We model the integrator as a RC circuit (conductance G 0 and capacity C 0) and the resonator as the circuit shown in Figure 1 (bottom part), consisting of an inductance L 1 and conductance g 1, in parallel to a capacity C 1 and shunted by a conductance G 1. These two models are the simplest linearized models that can exhibit integrative and resonant properties, respectively, and their mathematical description can be found in previous studies (Gerstner and Kistler 2002; Richardson et al. 2003; Brunel et al. 2003). However, for means of completeness, the main properties of the two filters are given in “Appendix 1.” The outputs of the filters V int and V res represent the subthreshold deviations of the transmembrane potential from the resting potential for the integrator and resonator neurons, respectively.

Neural activation

The subthreshold membrane potentials V int and V res are full-wave rectified to account for both polarities of the stimulus. In monopolar mode, a negative and a positive current input induce a depolarization and a hyperpolarization, respectively, of the fibers close to the active electrode. In addition, they also induce a hyperpolarization and a depolarization, respectively, of the same fibers at a location remote from the electrode (Rattay 1989). In bipolar mode, a negative current input depolarizes the fibers close to the active electrode and hyperpolarizes the fibers close to the return electrode. That is presumably why, irrespective of the stimulation mode, both polarities of a pulse can evoke neural spikes (Miller et al. 1999c). The full-wave rectification therefore assumes that both polarities of an alternating polarity stimulus are equally effective. We model membrane noise V noise as a Gaussian noise with amplitude distribution N [0, s 2] that changes its value every 4 μs. Spike initiation is assumed to occur whenever the transmembrane potential (V int + V noise or V res + V noise) crosses a threshold potential V thr, which is supposed to be constant across fibers. The probability of firing at a given discrete time k (k = 1...n; n being the number of samples contained in the stimulus) is thus given by:

$$ P_{{\operatorname{int} }} {\left( k \right)} = \frac{1} {2}{\left( {1 + {\text{erf}}{\left( {\frac{{\frac{{{\left| {V_{{\operatorname{int} }} {\left( k \right)}} \right|}}} {{V_{{{\text{thr}}}} }} - 1}} {{{\sqrt 2 }{\text{RS}}}}} \right)}} \right)}\quad {\text{and}}\quad P_{{{\text{res}}}} {\left( k \right)} = \frac{1} {2}{\left( {1 + {\text{erf}}{\left( {\frac{{\frac{{{\left| {V_{{{\text{res}}}} {\left( k \right)}} \right|}}} {{V_{{{\text{thr}}}} }} - 1}} {{{\sqrt 2 }{\text{RS}}}}} \right)}} \right)} $$
(1)

for the integrator and resonator units, respectively. Here, RS is the “relative spread” as defined by Verveen (1961) and erf is the error function.

$$ {\text{RS}} = \frac{s} {{V_{{{\text{thr}}}} }} $$
(2)
$$ {\text{erf}}{\left( x \right)} = \frac{2} {{{\sqrt \pi }}}{\int\limits_0^x {e^{{ - t^{2} }} {\text{d}}t} } $$
(3)

These formulas are typical of a Bernoulli process and have already been derived in Bruce et al. (1999a).

Central integration

Loudness is commonly believed to be integrated at a level central to the AN. Physiologically, this integration phenomenon probably relates to the integration of neural activity. Central neurons may fire only if they receive a sufficient number of input spikes (Middlebrooks 2004). We hypothesize here that the loudness of a stimulus relates to the number of spikes initiated at the AN level within a certain temporal window. We use a series of 20-ms rectangular windows W i (i  =  1...M, M being the number of windows contained in the stimulus), with a 0.5-ms step increment and integrate the probability of firing of the two processes (integrator and resonator) across each window. Loudness perception is assumed to relate to the maximum firing probability during any of the temporal integration window. For the sake of simplicity, the case of repetitive firing is not considered and refractory effects are not modeled. The probability of firing \( P^{{W_{i} }}_{{{\text{firing}}}} \) of a fiber during the ith temporal window is given by:

$$ P^{{W_{i} }}_{{{\text{firing}}}} = \lambda P^{{W_{i} }}_{{\operatorname{int} }} + {\left( {1 - \lambda } \right)}P^{{W_{i} }}_{{{\text{res}}}} \quad {\text{with}}\quad \left\{ \begin{aligned} P^{{W_{i} }}_{{\operatorname{int} }} = 1 - {\prod\limits_{k \in W_{i} } {{\left( {1 - P_{{\operatorname{int} }} {\left( k \right)}} \right)}} } \\ P^{{W_{i} }}_{{{\text{res}}}} = 1 - {\prod\limits_{k \in W_{i} } {{\left( {1 - P_{{{\text{res}}}} {\left( k \right)}} \right)}} } \\ \end{aligned} \right. $$
(4)

\( P^{{W_{i} }}_{{\operatorname{int} }} \) and \( P^{{W_{i} }}_{{{\text{res}}}} \) are the firing probabilities of the integrator and resonator neurons, respectively, during W i . For most of the conditions, the sampling frequency equals the rate of noise variations (250 kHz). In these cases, the window W i simply contains all the samples. For short-phase duration stimuli (<30 μs), because the sampling frequency is 500 kHz, the V int and V res vectors are first downsampled to the rate of noise variations prior to perform the central integration of Eq. 4.

Threshold detection and most comfortable level estimation

Two different cases are studied: deterministic (V noise  =  0) and stochastic (V noise  ≠  0) cases. For the deterministic model, all resonator neurons fire once V res exceeds V thr. Similarly, all integrator neurons fire once V int exceeds V thr. We assume that threshold is reached when one of the two potentials V int or V res exceeds V thr. The level needed to reach threshold is therefore simply inversely proportional to the maximum of V res and V int after full-wave rectification. For the stochastic model, we determine thresholds using an analytical technique derived from signal detection theory and described in Bruce et al. (1999b). Let a sequence of independent random variables (RVs) X j represent the firing state of each neuron during the ith temporal window (j  =  1...N f). As we do not consider the case of repetitive spiking, each X j represents the binary firing state of the jth fiber (X j  = 0 if the neuron did not fire at all during the window and X j  = 1 if it did). The output of the model X is the sum of the X j and is itself a RV that can be well approximated either by a Poisson distribution or a normal distribution depending on the value of its mean (Bruce et al. 1999a).

The probability of obtaining m spikes (m between 0 and N f) during the window W i is given by:

$$ P_{X} {\left( {X = m} \right)} = P_{X} {\left( m \right)} = \left\{ {\matrix {e^{{ - \mu }} \frac{{\mu ^{m} }} {{m!}}}{{\text{if}}\;\mu \leqslant 15} \\ {\frac{1} {{{\sqrt {2\pi } }\sigma }}e^{{^{{ - \frac{{{\left( {m - \mu } \right)}^{2} }} {{2\sigma ^{2} }}}} }} }{{\text{if}}\;\mu > 15} \ } \right. $$
(5)

Here, μ and σ are the mean and standard deviation of X.

$$ \mu = N_{{\text{f}}} P^{{W_{i} }}_{{{\text{firing}}}} $$
(6)
$$ \sigma ^{2} = \lambda N_{{\text{f}}} P^{{W_{i} }}_{{\operatorname{int} }} {\left( {1 - P^{{W_{i} }}_{{\operatorname{int} }} } \right)} + {\left( {1 - \lambda } \right)}N_{f} P^{{W_{i} }}_{{{\text{res}}}} {\left( {1 - P^{{W_{i} }}_{{{\text{res}}}} } \right)} $$
(7)

We can then determine the probability of correct detection Pr of the signal in a two-interval forced-choice task in a way identical to Bruce et al. (1999b). Consider two RVs X[I Stim] and X[0] that describe the number of discharges during one of the temporal integration windows in response to a stimulus of amplitude I Stim and 0, respectively. The probability Pr of choosing correctly the signal of amplitude I Stim is equal to the probability that more spikes are initiated in response to I Stim plus the probability of making a correct guess.

$$ \Pr {\left[ {I_{{{\text{Stim}}}} } \right]} = {\sum\limits_{m = 0}^{N_{{\text{f}}} } {{\left( {P_{{X{\left[ 0 \right]}}} {\left( m \right)}{\sum\limits_{l = m + 1}^{N_{{\text{f}}} } {P_{{X{\left[ {I_{{{\text{Stim}}}} } \right]}}} {\left( l \right)}} }} \right)}} } + \frac{1} {2}{\sum\limits_{m = 0}^{N_{{\text{f}}} } {P_{{X{\left[ 0 \right]}}} {\left( m \right)}P_{{X{\left[ {I_{{{\text{Stim}}}} } \right]}}} {\left( m \right)}} } $$
(8)

By increasing the amplitude of the stimulus I Stim, we obtain a psychometric function that rises from 50% (chance level) to 100%. Behavioral thresholds are often measured using a two-down, one-up procedure, which converges toward the 70.71% correct level (Levitt 1971) and we will always use this criterion for the stochastic threshold predictions presented in the following sections. Practically, the amplitude needed to reach threshold is adaptively tracked. The algorithm stops when the probability Pr reaches 70.71% with an error less than 10−5.

Most comfortable level (MCL) is assumed to correspond to a certain number of spikes elicited during the central integration window. Several spike counts will be studied in the “Results” section. As for threshold estimation, the amplitude of the stimulus that produces a desired number of spikes is adaptively tracked.

Parameters fitting

Absolute thresholds and MCLs can vary greatly among CI listeners. These differences may partly relate to neural survival, electrode placement, and geometry. The present model is designed and will only be used to make relative predictions, i.e., comparison of thresholds and MCLs in decibels between several stimuli. As described in “Appendix 1,” the transfer function of the integrator can be expressed as a function of a capacity C 0 and a time constant τ 0. Similarly, the transfer function of the resonator can be expressed as a function of a capacity C 1, a time constant τ 1, and two dimensionless parameters α and β. We define δ as the ratio of the two capacities \( \delta = {C_{1} } \mathord{\left/ {\vphantom {{C_{1} } {C_{0} }}} \right. \kern-\nulldelimiterspace} {C_{0} } \). The transfer function of the integrator can now be expressed as a function of C 1, δ, and τ 0. Because the two subthreshold processes are linear, the threshold or MCL difference in decibels between two arbitrary stimuli is a function of τ 0, τ 1, α, β, and δ and does not depend on the value of C 1 or V thr. The two parameters V thr and C 1 can be merged into a single variable (the product C 1 × V thr), which defines an absolute reference. Practically, the product C 1 × V thr is set to 1 to perform the computation. Then, for each set of data, the model predictions are adjusted (by vertical translation) to match one of the data points.

Deterministic model

The deterministic model uses five parameters (three from the resonator process, two from the integrator process). The three parameters τ 1, α, and β of the resonator process are determined by assuming that:

  1. (1)

    There is a frequency resonance at 80 Hz (this is the approximate value for which a minimum threshold is observed in some subjects subjected to sinusoidal stimulation; Pfingst 1988).

  2. (2)

    The amplitude of the complex impedance of the resonator is 1 dB larger at 100 Hz than at 50 Hz and 14 dB larger at 100 Hz than at 200 Hz (these are the mean threshold differences determined by Miller et al. (1999a) using monopolar sinusoidal stimulation).

Under these conditions, the model exhibits subthreshold oscillations after a step-current input. The frequency of these damped oscillations is 81.5 Hz (cf. Richardson et al. 2003 for calculation).

The two parameters τ 0 and δ of the integrator unit are adjusted so that:

  1. (1)

    The slope of the strength duration function (threshold vs phase duration function for single BP pulses) of the integrator has a mean decrease of 3.6 dB per doubling of phase duration from 12.5 to 400 μs (following Moon et al. 1993; cf. Fig. 2a).

  2. (2)

    The strength duration functions of the two processes equate at 500 μs. This is the approximate value for which a change of slope is observed in psychophysical data (Shannon 1985; Moon et al. 1993). This change of slope is assumed to be the point where the two processes contribute to loudness perception in an equal fashion. For phase durations lower than 500 μs, the integrator process dominates and for phase durations higher than 500 μs, the resonator process dominates.

FIG. 2
figure 2

(a) Deterministic thresholds of the integrator and resonator processes as a function of phase duration. (b) Strength duration functions for different relative spread (RS) values for the stochastic model. The deterministic case corresponds to V noise  =  0.

Under these conditions, the slope of the strength duration function of the two processes combined has a mean decrease of 5.6 dB per doubling of phase duration from 500 to 8000 μs (similar to the mean value of 5.7 dB obtained by Moon et al. for durations greater than 500 μs; cf. Fig. 2a). The numerical values of these parameters are provided in Table 1. In addition, a sensitivity analysis of these same parameters is given in “Appendix 2.”

TABLE 1 Numerical values of the deterministic and stochastic model parameters

Stochastic model

The stochastic model uses six parameters (the same five parameters as the deterministic model and the RS value). Figure 2b shows the influence of RS on the strength duration function for a single BP pulse, as predicted by the stochastic model, which combines both processes. As RS is increased, the mean slopes get steeper than for the deterministic case. Consider the case RS  =  0.18. For durations smaller than 500 μs; the integrator process dominates and the mean threshold decrease is less than 6 dB per doubling of phase duration. For durations larger than 500 μs, the resonator process dominates and the threshold decrease is more than 6 dB per doubling of phase duration. Although these slopes are steeper than those obtained by Moon et al. (1993) for single pulses presented in bipolar mode, we will see in the “Results” section that they are still consistent with data obtained by Shannon (1989) in monopolar mode using 10-pps BP stimuli. In the following sections, the predictions of the stochastic model are always obtained using a fixed RS of 0.18. N f is assumed to be 10,000, the same number used in the model of Bruce et al. (1999b). To determine MCL, we calculate the current needed to evoke 100 and 1,000 spikes during at least one integration window, which is equivalent to finding the current that leads to a maximal P firing of 0.01 and 0.1, respectively. We use these two spike counts because they give reasonable dynamic range values (difference between MCL and threshold) compared to those commonly observed in CI stimulation.

Stimuli

Model simulations were performed for different stimulus waveforms. The results were compared to the experimental data obtained with CI subjects described in several reports (Shannon 1985; Miller et al. 1997; McKay and Henshall 2003; Carlyon et al. 2005; Macherey et al. 2006; van Wieringen et al. 2006). An overview of the pulsatile stimuli is provided in Figure 3. Predictions of thresholds and/or MCLs were made for the following stimuli:

  1. (a)

    BP pulses with an IPG of zero,

  2. (b)

    Alternating biphasic (ALT-BP) pulses in which the leading polarity alternates from pulse to pulse,

  3. (c)

    Pseudomonophasic (PS) pulses, which consist of a short phase immediately followed by a longer and lower opposite phase eight times longer than the first,

  4. (d)

    Alternating pseudomonophasic (ALT-PS) pulses,

  5. (e)

    Delayed pseudomonophasic (DPS) pulses, which are identical to PS except that the long/low phase is delayed to be midway between two subsequent pulses,

  6. (f)

    Alternating delayed pseudomonophasic (ALT-DPS) pulses,

  7. (g)

    BP pulses with an IPG (BP + IPG) longer than in (a),

  8. (h)

    An alternating polarity version of BP + IPG (ALT-BP + IPG),

  9. (i)

    Alternating monophasic (ALT-M) pulses, which are identical to BP pulses except that again, the second phase is delayed to be midway between two subsequent pulses, and

  10. (j)

    BP pulses with an IPG where two subsequent phases have the same polarity (ALT-BP-SAME + IPG).

FIG. 3
figure 3

Overview of stimulus waveforms used for model predictions. PW denotes the phase duration of the pulses.

RESULTS

Validity of the linearity assumption

As already pointed out in the “Introduction,” linearized equations of conductance-based models are known to provide a good approximation of the neural response in the subthreshold regime (Mauro et al. 1970). However, as the transmembrane potential approaches threshold, nonlinearities become more prominent, thereby questioning the validity of the linearity assumption. The construction of the model presented in the “Methods” section is based on the assumption that linearized equations of conductance-based models can still provide a good approximation of the original equations at threshold level. We want to test this hypothesis by comparing the threshold predictions of the Hodgkin and Huxley (HH) model of the giant squid axon to its linearized version. The HH model is chosen because its linear behavior is well known (Mauro et al. 1970; Koch 1984) and because it shows a subthreshold resonance.

The predictions for the HH model are made using the softcell package software (Weiss 2000). We use the original parameters of HH at a temperature of 6.3°C. Action potentials are detected using the method described in Phan et al. (1994), which tracks the membrane model’s ionic gating events. The stimuli are 100-ms sinusoids and the stimulus frequency is varied from 10 Hz to 4 kHz. For each frequency, the amplitude is raised in steps of 0.2 dB. Threshold is assumed to be crossed when at least one action potential is detected. We compare these predictions to those obtained from the equations of the linearized squid axon membrane (cf. Fig. 19b in Mauro et al. 1970). For the linear model, threshold is assumed to be inversely proportional to the maximal amplitude of the model’s response to a unitary input.

The predictions of the two models are illustrated in Figure 4. Both functions are U-shaped with minima around 60 Hz for the HH model (consistent with a previous study by French 1984) and 67 Hz for the linearized model (consistent with Koch 1984). These minima correspond to the resonance frequency of the HH model, which is a type II model. Although the slopes of the linear model predictions are shallower than those of the HH model, the patterns remain comparable and the linear model appears to provide a good estimate of the threshold trend near the resonance frequency. In the following paragraphs, the model described in the “Methods” section is used to make predictions of psychophysical results obtained in CI stimulation.

FIG. 4
figure 4

Threshold predictions of the HH model of the squid axon and of its linearized version at 6.3°C.

Sinusoidal stimulation

Threshold vs frequency functions were measured in CI users for sinusoidal stimuli (Shannon 1983; Shannon 1985; Pfingst 1988). Thresholds are typically constant or slightly decrease with increases in frequency up to about 100 Hz. As already pointed out in the “Introduction,” some subjects show a threshold minimum between 70 and 100 Hz. Thresholds then increase at a rate of +15 dB per doubling of frequency from 100 to 250 Hz and at a rate of +3 dB per doubling of frequency for frequencies higher than 300 Hz. Figure 5 shows a summary plot of data (from Pfingst 1988) together with the predictions of the stochastic and deterministic models. As expected, the model predictions show a minimum at 80 Hz, the resonance frequency. For frequencies lower than 250 Hz, the resonator process dominates. For higher frequencies, both processes contribute to threshold (because threshold for the resonator process increases by +6 dB per doubling of frequency and threshold for the integrator process remains constant, the overall slope is approximately +3 dB per doubling of frequency). For frequencies higher than 1,000 Hz, the slope starts to be steeper as it gets closer to the cut-off frequency (1,700 Hz) of the integrator process.

FIG. 5
figure 5

Threshold for sinusoidal stimulation as a function of frequency: summary of behavioral data (replotted from Pfingst 1988) and deterministic and stochastic model predictions. The models’ references are arbitrarily chosen.

Symmetric biphasic pulses

Shannon (1985) measured thresholds for BP pulse-train stimuli (condition Fig. 6a) and found that at short-phase durations (<500 μs), thresholds were constant for rates up to 100 pps and then decreased by approximately 3 dB per octave. For long-phase durations (>500 μs), thresholds first decreased with increases in rate up to about 100 pps and then increased, leading to nonmonotonic threshold vs rate functions. Similar nonmonotonic functions were obtained by Pfingst et al. (1996) for phase durations of 1,000 and 2,000 μs. The results of a typical subject (subject EHT from Shannon 1985) are shown in Figure 6a, together with the predictions of the stochastic (Fig. 6b) and deterministic (Fig. 6c) models. The deterministic predictions (for the two processes combined) show nonmonotonicities for relatively long-phase durations (>500 μs). The minimum is reached when the stimulus coincides with the subthreshold oscillations of the membrane. This is demonstrated in Figure 7 where the output voltages of the resonator filter are represented for a 2,000-μs phase duration BP stimulus at three different rates. At 50 pps (Fig. 7a), the onset of the second pulse occurs when the transmembrane potential is close to the resting potential, so the amplitude of the response to the second pulse is similar to the response to the first. At 100 pps (Fig. 7b), the second pulse coincides with a time when the membrane is already depolarized, so the response to the second pulse reaches a greater amplitude than the first. The opposite phenomenon occurs for 200 pps (Fig. 7c), where the onset of the second pulse coincides with a hyperpolarization of the membrane. The presence of noise has an amplifying effect on the nonmonotonicities, as shown by the predictions of the stochastic model (Fig. 7b). Also, the central integration window induces a threshold drop at high frequencies. As we used a 20-ms window, threshold starts to drop at about 50 pps. This is because more and more pulses fall into the window and increase the probability of threshold crossing. The predictions of the stochastic model provide a good match to the threshold data. In the same study, Shannon (1985) showed that the subjects’ dynamic range increased with phase duration (ranging from 5–10 dB for a 100-μs phase to 30–35 dB for an 8,000-μs phase). This was mainly because of a slow growth of loudness just above threshold. The stochastic model predicts this trend but underestimates the size of the increase. The predicted dynamic range increases from 5.9 to 12.4 dB for a MCL criterion of 100 spikes and from 8.4 to 15.5 dB for a criterion of 1,000 spikes.

FIG. 6
figure 6

Summary of thresholds for biphasic stimulation as a function of phase duration and pulse rate. (a) Subject EHT (replotted from Shannon 1985). (b) Stochastic model predictions. (c) Deterministic model predictions. The model reference is chosen to match the BP threshold data at 100-μs phase duration and 10-pps rate.

FIG. 7
figure 7

Output of the resonator process (V res) for a 2,000-μs phase duration BP pulse-train stimulus at three different rates. For illustration purposes, C 1 is set to 1 μF.

Asymmetric biphasic pulses

In a previous study (Macherey et al. 2006), we have measured thresholds and MCLs for a variety of pulse shapes, including asymmetric stimuli in bipolar and monopolar mode. We showed thresholds to decrease by 0 to 3 dB when using 100-pps PS stimuli (condition Fig. 3c) compared to a “standard” BP stimulus. A much larger decrease was found using stimuli with a relatively long IPG such as DPS (Fig. 3e) or ALT-M (Fig. 3i).

Mean thresholds and MCLs (Macherey et al. 2006) for BP, ALT-PS (Fig. 3d), and ALT-M stimuli are illustrated together with the stochastic model predictions in Figure 8a. The phase duration was 97 μs for the three stimuli. The rate was 198 pps for BP and ALT-PS and 99 pps for ALT-M, and the electrode configuration was bipolar. The stochastic model accounts for the decrease in threshold and MCL. The mean number of discharges as a function of stimulus level is shown in Figure 8b for the three pulse shapes. ALT-M gives the lowest threshold because of the domination of the resonator process. This is more clearly shown in Figure 9 where the stimulus waveforms, integrator, and resonator outputs of the model are illustrated. The amplitudes of the integrator responses are similar for the three pulse shapes unlike those of the resonator responses. For BP, the amplitude of the resonator output remains low because the second phase counteracts the effect of the first. For ALT-M, the first phase induces a depolarization of the membrane. Then during the IPG the transmembrane potential starts to oscillate and the second phase of the pulse hyperpolarizes the membrane, which is already hyperpolarized at that time. Therefore, the voltage oscillations are amplified because the membrane potential is driven at a frequency more or less coinciding with the subthreshold oscillations. Additional predictions were made for other pulse shapes published in Macherey et al. (2006). Figure 10a, b summarize these results. The mean data together with the model predictions are illustrated. The stochastic model can account for the general trend of data both at short and long-phase durations, low and high rate.

FIG. 8
figure 8

(a) Comparison of mean data from Macherey et al. (2006) and stochastic model predictions. The model reference is chosen to match the mean BP threshold data. (b) Mean number of spikes elicited during the analysis window and standard deviations as a function of stimulation level for the three waveforms BP, ALT-PS, and ALT-M. The phase duration is 97 μs. The rate is 198 pps for BP and ALT-PS and 99 pps for ALT-M.

FIG. 9
figure 9

Stimulation current (top row), output of the integrator (middle row), and output of the resonator (bottom row) for the BP, ALT-PS, and ALT-M stimuli (same parameters as in Fig. 8). For illustration purposes, C 1 is set to 1 μF.

FIG. 10
figure 10

Mean data 1from Macherey et al. (2006) and stochastic model predictions. (a) 97-μs phase, 99-pps pulses. (b) 22-μs phase, 813-pps pulses. The ALT-DPS “short/high only” and “long/low only” correspond to the ALT-DPS stimulus with the long/low phases and the short/high phases removed, respectively. The model reference is the same for subpanels (a) and (b) and is chosen to match the mean BP (99-pps rate, 97-μs phase) threshold data.

Biphasic pulses with an inter-phase gap

The effects of IPG on thresholds and MCLs of CI users were presented in two previous publications (McKay and Henshall 2003; Carlyon et al. 2005).

First, McKay and Henshall found thresholds and MCLs to decrease with increases in IPG up to 100 μs, the longest value tested (condition Fig. 3g). This effect was greater at threshold than at MCL, greater at the shorter phase duration (26 μs vs 52 μs), and not significantly different for the two rate values tested (1,000 and 4,000 pps). Their results, together with the stochastic model predictions, are illustrated in Figure 11. In addition to the two spike counts used previously (100 and 1,000) to determine MCL, a third count of 10 spikes is studied. The stochastic model can account for the different observations at threshold but underestimates the current difference at MCL. In addition, an increase in the spike count for MCL estimation leads to a reduced effect of IPG on the predictions.

FIG. 11
figure 11

Effects of IPG on threshold and MCL of BP stimuli. For three different conditions of phase duration (PW in microseconds) and rate (in pulses per second), the bars illustrate the mean current difference (in decibels) needed to maintain a constant loudness when the IPG varies from 8.4 to 45 μs and from 45 to 100 μs. Mean data (bars) and standard deviations from McKay and Henshall (2003) and stochastic model predictions of threshold (asterisks) and MCLs (using three different spike counts).

Second, Carlyon et al. (2005) showed the effects of IPG to extend over several milliseconds using 100-pps stimuli. They also demonstrated that this effect depended on whether the polarity of the two phases of the pulse was the same (condition Fig. 3h) or opposite (Fig. 3j). When they were the same, thresholds slightly increased with IPG whereas in the opposite case, thresholds continued to drop as IPG increased up to 4.9 ms. The mean results and standard deviations of their four subjects are illustrated together with the stochastic model predictions for these two conditions (Fig. 12). The trends are well predicted by the model, which leads to considerable threshold reductions when two subsequent phases of opposite polarity are separated by a relatively long IPG (5 ms).

FIG. 12
figure 12

Effects of IPG on the ALT-BP + IPG and ALT-BP - SAME+IPG conditions: mean data of four subjects from Carlyon et al. (2005) and stochastic model predictions.

We have also studied the effects of rate on ALT-M and BP thresholds and MCLs in cochlear implantees (van Wieringen et al. 2006). For BP stimuli, thresholds decreased with increases in rate whereas for ALT-M, thresholds first increased from 100 to 250 pps and then decreased. The predictions of the model together with the mean data of two subjects are illustrated for BP and ALT-M in Figure 13. The bell shape of the threshold function for the ALT-M stimulus (Fig. 13a) is also observed in the model results (Fig. 13b, c). This is because for BP, the integrator process dominates for the whole frequency range whereas for ALT-M, at low frequencies, the frequency of alternation approaches the subthreshold oscillations frequency of the resonator process so that thresholds are very low. As frequency increases, the resonator process is not driven at this preferred frequency and its threshold increases. At higher frequencies, threshold decreases again because of the central integration effect. MCLs show the same pattern but with smoother variations. Finally, the model also accounts for the increase in dynamic range with increases in rate, as observed in another study (Kreft et al. 2004).

FIG. 13
figure 13

Effects of rate on thresholds (open symbols) and MCLs (filled symbols) for BP and ALT-M stimuli with a 97-μs phase duration. (a) Mean results of two subjects (from van Wieringen et al. 2006). (b) Stochastic model predictions with a spike count of 100 used for MCL. (c) Stochastic model predictions with a spike count of 1,000 used for MCL. For means of comparison with the original data illustrated in van Wieringen et al. (2006), ALT-M is illustrated as a function of twice the pulse rate. The model reference is chosen to match the BP threshold data at 200 pps.

DISCUSSION

Resonance in neurons

Whereas leaky integration of charge is commonly accepted as the main process underlying biphasic threshold levels at short-phase durations (Moon et al. 1993), the nonmonotonicities of the threshold vs rate function observed for long-phase duration pulses have remained difficult to explain. In the present study, we have shown that a resonant process can account for these nonmonotonicities and for those observed in sinusoidal stimulation in some CI subjects. Moreover, it can account for the decrease in threshold with increases in IPG and for the increase in threshold with increasing frequency in ALT-M stimulation. Clopton et al. (1983) already hypothesized that nonmonotonic functions observed in animal models may result from a type of resonance in auditory neurons similar to what is observed in experimental and modeling studies of the squid axon (Guttman and Hachmeister 1971). The HH model is a type II model (Izhikevich 2001) and, at the temperature of the squid, does exhibit a frequency resonance around 60 Hz (Fig. 2). However, using the appropriate parameter corrections to account for the higher temperature and higher channel density of the human AN (Rattay et al. 2001), the time constants are smaller and the resonance frequency much higher. Consequently, the voltage-gated ion channels of the original HH model are not sufficient to explain the nonmonotonic trends observed in the human AN. The ion channel responsible for the nonmonotonicities exhibited by the present model has a relatively slow relaxation time constant of about 1 ms and induces subthreshold resonance and damped oscillations. Such an ion channel is not included in previously published conductance-based models of the AN and further investigations are needed to determine whether its existence is realistic or not. A large range of ion channels can exhibit resonant behavior (Hutcheon and Yarom 2000; Richardson et al. 2003). Two possible candidates are (1) the hyperpolarization-activated current (I h) and (2) a combination of slow potassium and persistent sodium channels. First, the I h channel is known to have high time constants and has already been found in mammalian spiral ganglion cells (Chen 1997; Mo and Davis 1997). Second, McIntyre et al. (2002) included slow potassium and persistent sodium channels in a model of mammalian motor nerve fibers and showed them to be responsible for the depolarizing afterpotentials, suggesting they may play a significant role in mammals. Also, Longnion and Rubinstein (2006) recently implemented slow potassium channels in their stochastic AN model. The identification of a potential resonant ion channel in the AN is, however, beyond the scope of this study and the present model should only be considered tentative to give a physiologically based explanation of the nonmonotonic threshold functions obtained with CI users and of the effects of pulse shape on thresholds and MCLs.

Other hypotheses have already been proposed in previous articles to interpret nonmonotonic trends observed in CI sinusoidal and pulsatile stimulation (Shannon 1983; Shannon 1985; Pfingst et al. 1996; Miller et al. 1997) and they will be discussed in the following paragraphs. We will refer to the “descending arm” of the threshold function for the decrease in threshold with increases in frequency up to about 70–100 Hz in biphasic or sinusoidal stimulation and to the “ascending arm” of the threshold function for the increase in threshold with increases in frequency above 70–100 Hz in biphasic, sinusoidal, or ALT-M stimulation.

Peripheral and central processes

The first hypothesis that has to be discussed is whether the descending and ascending arms of the threshold functions are the result of a specific process at the level of the AN. Nonmonotonic threshold vs frequency functions were also obtained in central auditory neurons of mammals (Clopton et al. 1983) and avians (Schwarz et al. 1993; Strohmann et al. 1995). So the nonmonotonicities observed in psychophysical experiments may result from a frequency selectivity of neurons central to the AN and not, as assumed in the present study, from a process at the AN site.

The descending arm of the threshold function may relate to temporal integration at a location central to the AN. If the rate increases, then more pulses fall within a certain central integration window (Middlebrooks 2004). It is not clear, however, why this phenomenon would depend on the phase duration, showing steeper slopes for long-phase duration BP pulses. The present model provides an explanation for this trend. Part of this decrease comes from the central integration window but the slope of the decrease also depends on the phase duration value because as the rate increases, its value gets closer and closer to the subthreshold oscillation frequency of the resonator process. Therefore, the effect is larger when the resonator process dominates, i.e., at phase durations higher than 500 μs for BP stimuli.

Two observations suggest that the mechanism responsible for the ascending arm of the threshold functions is located at the AN site. First, Carlyon et al. (2005) showed that the decrease in threshold with increases in IPG up to 4.9 ms only occurred when the IPG was varied between two phases of opposite polarity and not when they were of the same polarity (cf. Fig. 12). They interpreted these findings as an evidence of a mechanism at the level of the cochlea/AN and not from a release of refractoriness at a more central level. Second, Zeng et al. (2000) found that thresholds for sinusoidal stimuli could be lowered if a subthreshold noise was added to the stimulus. They interpreted this result as a demonstration of stochastic resonance in the AN and showed that the threshold shift in presence of noise was dependent on the sinusoidal frequency, being maximal around 100 Hz (the lowest frequency tested) and decreasing with increasing frequency. In this same study, Zeng et al. performed the same experiment with brainstem implantees (where the AN is bypassed) and, interestingly, did not find the same frequency dependence. This suggests that the frequency dependence arises at least partly from a process at the AN site and not purely central. What they interpreted as a stochastic resonance effect may in fact relate to the enhancement of the response of resonator neurons stimulated close to their resonant frequency. This alternative explanation is supported by a report of Richardson et al. (2003) who studied the response of a simulated neuron with subthreshold resonance to a sinusoidal stimulus in the background of a white-noise source. They showed that when the noise was sufficiently strong to cause the neuron to fire irregularly, input frequencies close to the subthreshold resonance frequency were the most amplified ones.

Additional potential mechanisms

Potential mechanisms responsible for the ascending arm in biphasic stimulation with long-phase durations were reviewed in two different studies (Pfingst et al. 1996; Miller et al. 1997). They include refractoriness, accommodation, and residual potential effects.

Refractoriness is an eligible mechanism because as the time between two subsequent pulses is increased, the neurons that have fired after the first pulse will be more likely to fire again because they will progressively come out of their refractory period. As refractory effects are believed to occur at pulse separations up to 6 ms (Miller et al. 1997), this may partly explain the ascending arm of the threshold function for BP stimuli and also the decrease in threshold with increases in IPG. If this was the only explanation, however, the effect would be expected to be larger, or at least equal, at high levels of stimulation, where more fibers are excited, than at threshold. However, in a recent study using ALT-M pulses (van Wieringen et al. 2006; cf. Fig. 13), we found the opposite trend: that the slope of the ascending arm was steeper at threshold than at MCL. Some subjects did not even show any ascending arm at MCL. So refractoriness effects are unlikely to be fully responsible for the ascending arm of the threshold function. Also, Miller et al. (1997) obtained different threshold vs pulse separation functions of nonhuman primates when the leading polarity of subsequent pulses did alternate or not (conditions Fig. 3a, b). First, threshold functions for the BP shape (Fig. 14a, squares) at relatively long-phase durations (2 ms) were nonmonotonic, similar to what is found in humans. Second, the thresholds for the ALT-BP condition (circles) were similar to the thresholds for the BP condition at long-pulse separations (low rate) but were lower at shorter separations and did not show the nonmonotonic pattern. They suggested that it may involve refractory effects in a polarity-segregated neuron array. Our model provides an alternative explanation for this trend (Fig. 14b). Although the slopes are steeper for the model predictions, the relative trends remain similar. Slopes of behavioral threshold functions are typically steeper for humans than for other species (Miller et al. 1999a, b).

FIG. 14
figure 14

(a) Psychophysical thresholds of a macaque monkey (from Miller et al. 1997) for the BP and ALT-BP stimuli. Thresholds were obtained for 20-pulses stimuli, thereby leading to a covariation of stimulus duration with pulse separation. (b) Stochastic model predictions. The model reference is chosen to match the mean BP threshold data at a 0.2-ms pulse separation.

Accommodation effects can occur after long subthreshold depolarizations because of the inactivation of sodium channels. Pfingst et al. (1996) suggested that this phenomenon may explain why the ascending arm in biphasic stimulation is observed at long-phase durations and not at shorter ones. However, the threshold increase with increasing rate observed in ALT-M stimulation (van Wieringen et al. 2006) was still evident at short-phase durations (25 μs), suggesting that long-phase durations are not necessary to produce the ascending arm.

Residual potential effects were investigated with CI users in two different studies using BP pulses with phase durations shorter than 50 μs (Eddington et al. 1994; de Balthasar et al. 2003). de Balthasar measured the threshold of a BP pulse-train probe, which was interleaved with a subthreshold BP pulse-train masker presented on an adjacent channel. When the leading polarities of the masker and probe were opposite, the probe threshold was lower than its unmasked threshold for delays between masker offset and probe onset that were shorter than 150 μs. They observed the opposite trend when the leading polarities were identical. Eddington et al. (1994) found similar results with single pulses having opposite leading polarities for delays up to approximately 400 μs. However, when the leading polarity was the same, the masked threshold remained about 1 dB lower than the unmasked threshold and this difference persisted for delays up to 800 μs (the largest value tested). Although this last observation is not consistent with a residual potential summation, the other trends suggest that the neural membrane potential needs a finite time after the offset of a BP pulse to return to its resting value. The associated time constant (between 31 and 40 μs as calculated by de Balthasar et al.) is in the same order of magnitude as the time constant of the integrator process of our model (95 μs). At relatively short-phase durations, as used in those two studies, our model predicts that the integrator process dominates, leading to fast recovery to rest. However, the model also predicts that long-phase duration (>500 μs) or long-IPG BP pulses would produce extended residual potential effects because the (slower) resonator process would dominate.

Comparison to other phenomenological models and limitations

The construction of the present model was inspired from three previously published phenomenological models (Shannon 1989; Bruce et al. 1999a, b; Carlyon et al. 2005).

As in Shannon’s (1989) model, the present model uses dual processes. The resonator can be related to the “compressive” process of Shannon’s model and the integrator to its “envelope” process. The main difference is that the subthreshold processes of the present model are linear whereas Shannon used nonlinear power-law transformations. Moreover, as pointed out in the “Introduction,” Shannon’s model cannot account for the effects of IPG whereas the present model can. Shannon (1989) suggested that its compressive process may relate to the spiral ganglion cell survival. The somas of spiral ganglion cells are larger in humans than in other mammals (Rattay et al. 2001) and may involve longer time constants than what is typically observed in single-cell recordings (Shepherd and Javel 1999).

In CI stimulation, the variance in response of the AN to electrical stimuli is believed to be essentially due to membrane noise (Matsuoka et al. 2001). Bruce et al. (1999a, b) developed a simple stochastic model that assumed a perfect integration of charge. Their model can account for the effects of rate and phase duration in biphasic stimulation. However, only the cathodic phase of the pulse is assumed to be effective, which makes it unable to predict the effects of IPG or asymmetric pulses. Bruce et al. (1999b) used a RS value that was dependent on the phase duration of the stimulus. Although no physiological observations contradict this dependence for biphasic stimuli, it is known that the RS does not depend on the pulse duration for monophasic stimulation (Rubinstein 1995). RS was only shown to depend on the interpulse interval of the stimulus (Matsuoka et al. 2001). The present model uses a constant value for RS although this value (0.18) is much larger than what is typically measured in single-unit recordings of the cat (about 0.06; Miller et al. 1999c). This difference may partly come from the fact that every fiber in our model has the same threshold and that the current spread is uniform. Xu and Collins (2005) demonstrated that the effective RS (taking into account the entire neural population) was larger when individual fiber thresholds were uniformly distributed from −5 to +5 dB than when they were constant.

The present model gives similar threshold predictions as the model of Carlyon et al. (2005). Our resonator and integrator filters can be viewed as a decomposition of Carlyon’s lowpass filter into two separated processes. However, their model cannot predict loudness growth and to predict MCLs of CI users would probably require the implementation of a second filter, thus multiplying the number of variables.

A 20-ms rectangular central integration window was used in the present model. Some other shapes, probably more realistic as suggested by studies with normal hearing listeners (Moore et al. 1988), should be considered in the future. Carlyon et al. (2005) used a Hanning window with a total duration of 20 ms, McKay et al. (2003) and Moore et al. (1996) used a window with exponential decays having an equivalent rectangular duration of 7 ms. Our model cannot account for the decrease in threshold observed at long stimulus durations, which may be due to more central processes and effectively modeled by additional mechanisms such as “multiple looks” (Viemeister and Wakefield 1991; Donaldson et al. 1997). It is however interesting to note that Moon et al. (1993) observed that at 100 pps the decrease in threshold with increases in stimulus duration did depend on the phase duration. They found this decrease to be larger using a 1,536-μs phase compared to a 96-μs phase. This may be due to the resonator process, which would, at 100 pps, amplify the amplitude of the long-phase duration (1,536 μs) biphasic response but not of the shorter one. Xu and Collins (2004) implemented a multiple looks approach in their stochastic AN model and compared its predictions to those of a long-term integration (100-ms duration) approach. They showed the multiple looks model to predict more trends of psychophysical data than the long-term integration model. They also found that in the case of a small number of stimulated fibers (N f  =  100), the multiple looks model predicted a nonmonotonic threshold vs rate function for biphasic stimuli.

In a previous psychophysical study, we found MCLs of anodic-first PS stimuli to be higher than cathodic ones in monopolar mode (Macherey et al. 2006). The full-wave rectification used in our model makes it unable to predict such effects of leading polarity. Rattay (1989) studied the effects of electrode geometry on neural activation and showed that neural fibers were more symmetrically excited by bipolar electrodes than by monopolar ones. Therefore, our full-wave rectification hypothesis is probably a better approximation for bipolar mode than for monopolar mode, where it overestimates the contribution of one polarity over the other. A model of spatial excitation using, e.g., the activating function of Rattay (1989) may help to explore polarity effects and their dependence on the electrode-coupling mode.

Another limitation of the model lies in its inability to predict refractory effects. Neural refractoriness was not modeled and may not be the main determinant of loudness perception in single-channel stimulation. One reason could be that CI are operating at low levels of discharge probability as previously suggested by Bruce et al. (1999a) and assumed in the present study. Another reason may be that, for many of the manipulations used here, refractory effects are more or less equivalent and do not strongly affect the ability of the model to account for the data.