1 Introduction

Basilar-membrane (BM) vibrations in healthy cochleae grow compressively with the intensity of stationary sounds such as tones and broad band noise (de Boer and Nuttall 1997; Rhode 1971; Robles and Ruggero 2001). Similarly compressive nonlinearities also affect mechanical responses to pulsatile stimuli such as clicks (e.g. Robles et al. 1976), and in two-tone experiments (Robles et al. 1997), where they reputedly act on a cycle-by-cycle basis to generate distinctive patterns of intermodulation distortion products. The extent to which a compressive input-output relation holds when the intensity of a stimulus fluctuates in time has not been studied systematically, however, and the possibility that the cochlea operates more like an automatic gain control (AGC) system than an instantaneously nonlinear system remains feasible (van der Heijden 2005). In the present study, we investigate the dynamics of the cochlea’s compressive mechanism(s) by observing BM responses to balanced, beating pairs of low-level tones near the recording site’s characteristic frequency (CF).

2 Methods

Sound-evoked BM vibrations were recorded from the basal turn of the cochlea in terminally anesthetised gerbils, using techniques similar to those reported previously (Versteegh and van der Heijden 2012). Acoustic stimuli included: (i) inharmonic multi-tone complexes (zwuis stimuli), which were used to characterise the tuning properties of each site on the BM (including the site’s CF); and (ii) “beating” pairs of inharmonic near-CF tones, with component levels adjusted to produce near-perfect periodic-cancellations in the response envelopes at individual sites on the BM (cf. Fig. 1).

Fig. 1
figure 1

a BM responses to near-CF beat stimuli. f1 = 17,373 Hz, 30dB SPL, f2 = 18,656 Hz, 33dB SPL, Δf = f2 ‑f1 = 1283 Hz. b Observed (solid blue line, from A) and linearly predicted (dashed blue line) response envelopes for Δf = 1283 Hz. c Observed (solid red line) and predicted (dashed red line) response envelopes for Δf = 20 Hz. d Normalised gain (= observed/linearly-predicted envelope) for Δf = 20 Hz (red) and 1283 Hz (blue). Experiment RG15641, CF = 18 kHz

3 Results

3.1 Time-Domain Observations

BM responses to well balanced, “beating” two-tone stimuli had heavily compressed envelopes, as illustrated in Fig. 1. The amount of compression was quantified by comparing the shapes of the observed response envelopes with those that would be predicted in a completely linear system (illustrated by the dashed blue and red lines in Figs. 1 b and c, respectively). When scaled to have similar peak magnitudes, the observed envelopes exceeded the linear predictions across most of the beat cycle (cf. Fig. 1 b, c). Expressing the ratios of the observed and predicted envelopes as “normalised gains” (Fig. 1 d), the envelope’s maximum gain always occurred near (but not exactly at) the instant of maximal cancellation between the two beating tones. This is as expected in a compressive system, where (by definition) “gain” decreases with increasing intensity, or increases with decreasing intensity.

3.2 Temporal Asymmetry

At low beat rates (e.g., for Δf = f2 ‑f1 ~ 10–160 Hz), observed response envelopes appeared symmetric in the time domain (cf. Fig. 1 c). At higher rates, however, the rising flanks of the response envelopes became steeper than the falling flanks (cf. Fig. 1 a, b). This temporal asymmetry reflects hysteresis, and is reminiscent of the output of a gain control system with separate attack and decay time-constants.

3.3 “Gain Control” Dynamics

Three further response characteristics (in addition to the hysteresis) became more pronounced at higher beat rates: (1) the point at which the observed envelopes exhibited their maximal gains (relative to the linearly predicted responses) occurred later and later in the beat cycle, (2) the overall amount of envelope gain decreased, and (3) the peakedness of this gain within the beat-cycle decreased (compare the red and blue curves in Figs. 1 d). These characteristics were confirmed and quantified by spectral analysis of the envelopes’ temporal gain curves, as illustrated in Fig. 2.

Fig. 2
figure 2

Spectral analysis of beat response envelope gain in experiment RG15641. a Magnitude. b Phase of envelope gain functions (cf. Fig. 1 d) as a function of beat rate (Δf = f2 ‑f1). Coloured curves represent the contributions of the different harmonics of Δf (labelled in A) to the envelope gain functions at each beat rate. c Magnitude. d Phase of the envelope transfer function derived from A and B. Coloured symbols from A and B are plotted at the actual frequencies of the spectral components (i.e. at f = n.Δf) for the whole family of beat rates tested (20-2560 Hz), and magnitudes are normalized re: the quasi-static (QS, Δf = 20 Hz) harmonics

The left panels of Fig. 2 show spectral decompositions of the envelope gain functions (cf. Fig. 1 d) for a wide range of stimulus beat rates. Response energy at each harmonic of the beat rate decreases as the beat rate increases (Fig. 2 a), suggesting that envelope gain is subjected to a low-pass filtering (or smoothing) process before it is observed on the BM: this is the spectral counterpart to the decreased gain at higher beat rates (point 2) referred to above. Response energy decreases more rapidly with increasing frequency for the higher harmonic numbers in Fig. 2 (i.e., the coloured curves in Fig. 2 a are more closely spaced at low beat rates, and more widely spaced at higher rates): this is the spectral counterpart of the decreased peakedness of the temporal gain curves (point 3) referred to above.

The low-pass filtering apparent in the spectral analysis of the beat responses can be characterised by deriving a compound transfer function for the response envelope gain, as illustrated in Figs. 2 c and d. This shows the amount of energy in the envelope gain function at each harmonic of the beat rate, after normalization by the BM’s quasi-static (i.e., Δf → 0) input-output function. In the case of Fig. 2 c, we have used the 20 Hz beat rate data to normalize the data at higher frequencies, and plotted each of the harmonics at its own absolute frequency (i.e. at f = n.Δf, where n is the harmonic number). This reveals a low-pass “gain control” transfer function with a ‑3 dB cut-off at ~ 1 kHz.

The energy at each harmonic of the beat rate in the envelope gain function also occurs later in the beat cycle as the beat rate increases, as shown in Figs. 2 b and d (negative phases in Fig. 2 indicate phase-lags). The phases in Fig. 2 b scale almost proportionally with harmonic number, suggesting that (on average) the observed gain curves lag the effective stimuli (the linearly predicted envelopes) by a constant amount of time. This delay is approximately 50 μs for most of our data (i.e., equating to 0.2 cycles of phase-lag in a 4 kHz range, as shown in Fig. 2 d). These observations are the spectral counterparts of the delays observed in the instants of maximum envelope gain mentioned in Sect. 3.1 (cf. Fig. 1 d, where both the red and blue curves peak 50 μs after the null in the observed response envelope).

3.4 Linearization of Function

The observed decreases in envelope gain and in gain peakedness with increasing beat rate act synergistically to reduce the amount of compression exhibited in the BM response envelopes at high beat rates. This “linearization” is illustrated directly in Fig. 3 by plotting the response envelopes as a function of the instantaneous intensity of the beating stimulus. The decreases in gain are observed as clockwise rotations in the linear coordinates of Fig. 3 a, and as downward shifts on the logarithmic plots of Fig. 3 b. The linearization that accompanies the decreases in gain is seen as an increase in the threshold above which compression is seen (e.g. the blue 20 Hz data of Fig. 3 b start to deviate from linearity above ~ 10 dB SPL and ~ 0.3 nm, whereas the steepest, rising flank of the black 1280 Hz data remains nearly linear up to ~ 25 dB SPL and 2 nm). One consequence of this linearization is that BM’s response dynamics can actually enhance the encoding of dynamic stimuli, within certain limits. However, this “enhancement” comes at the price of reduced overall sensitivity.

Fig. 3
figure 3

Response linearization with increasing beat rate (experiment RG14612, CF = 15 kHz). Instantaneous envelope input-output functions for three beat stimuli on a linear, and b logarithmic scales. Arrows in A distinguish rising and falling phases of 1280 Hz data. Dashed lines in B show linear (1 dB/dB) and compressive (0.333 dB/dB) growth rates

4 Discussion

The mechanical nonlinearity of the mammalian cochlea is clearly fast, but not instantaneous. The results of the current investigation show that dynamic changes in stimulus intensity can only be followed accurately up to a rate of ~ 1 kHz in the basal turn of the gerbil cochlea. This limitation appears to be produced by a low-pass gain control filter of some kind: this could be a mechanical filter, imparted by the load on a force generating element (such as a motile outer hair cell, or OHC), or an electrical filter (such as the OHC’s basolateral membrane, cf. Housley and Ashmore 1992). The filter could even involve a combination of multiple, coupled mechanisms: preliminary analysis suggests that the gain at one site on the BM may be controlled remotely, presumably by elements (e.g., OHCs) that are distributed more basally along the BM.

Our observation of hysteresis in the BM’s response to a dynamic stimulus is not entirely new—similarly asymmetric envelope responses have been reported in studies using amplitude modulated tonal stimuli (e.g., Rhode and Recio 2001). Our observation of a distinct delay to the appearance of compression (or gain) in the response envelopes is new, however. We currently believe that it is impossible to produce such a delay without resorting to a non-instantaneous form of gain control, such as an AGC. Other (non-AGC) types of system which produce level-dependent group delays can easily mimic the envelope asymmetry (hysteresis) effects that we observe, but (to the best of our knowledge) they cannot produce an actual delay in the “effect” of the nonlinearity. If this delay really does signify the presence of an AGC in the cochlea, it may well prove significant for the coding of a wide range of stimuli, and have consequences for numerous psychophysical phenomena.