INTRODUCTION

The ability to perceptually separate simultaneous sounds of different frequencies is a fundamental property of the auditory system. The filtering of sounds according to frequency and the resulting tonotopic organization found throughout the auditory pathways to the auditory cortex (Read et al. 2002) have their basis in the electromechanical properties of the cochlea. Indeed, it is widely believed that frequency selectivity, measured behaviorally, can be regarded as a direct reflection of the filtering that takes place in the cochlea.

Estimates of frequency selectivity in humans have evolved over many decades as numerous confounding factors and potential artifacts were addressed (e.g., Wegel and Lane 1924; Fletcher 1940; Zwicker et al. 1957; Bos and de Boer 1966; Houtgast 1973; Patterson and Nimmo–Smith 1980; Moore et al. 1984; Glasberg and Moore 2000). The psychophysical tuning curve (PTC) (Houtgast 1973; Moore 1978) provides a paradigm that is, in principle, most similar to physiological measures of cochlear tuning, such as the neural tuning curve. However, issues such as off-frequency listening and the detection of beats and/or distortion products can make the results from PTC experiments difficult to interpret (Johnson–Davies and Patterson 1979; O’Loughlin and Moore 1981; Patterson and Moore 1986). Over the last two decades, the notched-noise technique has become the favored method of behaviorally estimating frequency selectivity (Patterson 1976; Patterson and Nimmo–Smith 1980; Moore 1987; Rosen et al. 1998; Glasberg and Moore 2000). This technique involves measuring the masked threshold of a sinusoidal signal in the presence of a noise with a spectral notch as a function of the width and position of the notch relative to the signal. The filter functions derived from such techniques are often referred to as auditory filters and are believed to reflect the filtering properties of the cochlea (Moore 1995). One of the most comprehensive studies of frequency selectivity using notched-noise maskers provides a function that can be used to calculate the estimated equivalent rectangular bandwidth (ERB) of the auditory filter at any given frequency (Glasberg and Moore 1990). The equation for the function is

where ERB is the equivalent rectangular bandwidth and f is the filter center frequency, both in Hz. This function has been used as an estimate of human cochlear tuning in a wide range of studies and applications (e.g., Beauvois and Meddis 1996; Dau et al. 1996; 1997; Moore et al. 1997; Breebaart et al. 2001). In all these studies, the auditory filters are assumed to be linear in any given condition, although it is explicitly acknowledged that they change their shape with level.

While ignoring cochlear nonlinearities for the sake of simplicity has certain advantages, some important effects are dependent on these nonlinearities. One consequence of cochlear nonlinearity is known as suppression, whereby the neural response to one tone can be reduced by the introduction of a second, suppressor, tone (Sachs and Kiang 1968). It has been known since this effect was first investigated behaviorally that estimates of cochlear tuning can be altered by effects ascribed to suppression. In general, estimates using simultaneous masking, where the masker is thought to suppress the signal to some degree, produce wider estimates of filter bandwidth than do estimates using nonsimultaneous masking, where the masker does not peripherally interact with, or suppress, the signal (Houtgast 1973, 1974; Moore 1978; Vogten 1978; Moore et al. 1987). Consistent with these findings, Heinz et al. (2002) showed in a recent modeling study that when their nonlinear model was used to predict thresholds in a simultaneous masking notched-noise experiment, the resulting estimated auditory filter bandwidths were wider than the bandwidths of the filters actually used in the model. In other words, the nonlinearities in the model in combination with the simultaneous notched-noise method resulted in biased estimates of filter bandwidth. As their model incorporated stronger nonlinearity at high frequencies than at low frequencies (in line with relevant physiological data), the bias effects increased with increasing center frequency.

The prediction that suppression may affect tuning estimates more at high characteristic frequencies (CFs) than at low CFs is important: If suppression simply changes the effective tuning by a constant factor, regardless of frequency, then this would be easy to incorporate into current functions relating auditory filter ERB to frequency (Glasberg and Moore 1990). On the other hand, if suppression effects change with frequency then this could affect the entire shape of the ERB function. In either case, as has been pointed out previously (e.g., Glasberg and Moore 2000), it is clear that caution must be exercised when relating data from simultaneous-masking experiments to cochlear tuning, as measured in physiological experiments.

Ideally, a psychoacoustic measure of cochlear tuning would share many of the properties of neural measures of tuning. Most importantly, perhaps, the stimuli should be at a low level (where tuning is generally sharpest) and the masker and signal should not be presented simultaneously (to avoid suppression effects). There are surprisingly few modern studies of frequency selectivity that meet these criteria. Using the notched-noise method, Glasberg and Moore (1982) measured auditory filter shapes in forward masking for a fixed low-level 1-kHz signal. They found a 3-dB bandwidth of about 85 Hz. Using a rounded-exponential (roex) model, as they did, this corresponds to an ERB of about 100 Hz—about a factor of 1.3 less than the value given by Eq. (1). Using a relatively high masker level, Moore et al. (1987) compared auditory filter shapes in forward and simultaneous masking directly and also found that the resulting ERBs differed by a factor of about 1.3. To our knowledge, there are no notched-noise studies using forward masking that have measured frequency selectivity at any signal frequency other than 1 kHz. There are both physiological (Cooper and Yates 1994; Rhode and Cooper 1996) and psychophysical (Hicks and Bacon 1999; Plack and Oxenham 2000) indications that nonlinearity increases at high CFs. Furthermore, Heinz et al. (2002) showed in their model that increasing nonlinearity leads to increasingly biased estimates of cochlear tuning when using simultaneous masking. Thus, it is possible that the differences between estimates using forward masking and those using simultaneous masking will increase with increasing CF. If, as is often assumed, forward masking produces a measure that is more comparable to physiological measures of cochlear tuning, then knowledge of how frequency selectivity in forward masking changes with CF should provide valuable information on the nature of human cochlear tuning.

This study measured frequency selectivity in forward masking using the notched-noise method for signal frequencies between 1 and 8 kHz, with the aim of providing improved estimates of human cochlear tuning as a function of frequency. For comparison, data from simultaneous-masking conditions were collected in the same subjects. In an effort to match the conditions of neural tuning curves as closely as possible, the signal was presented at a fixed low level and the masker level was varied to measure threshold. This fixed-signal method has been shown to be preferable to a fixed-noise method for notched-noise measures on both empirical and theoretical grounds (Rosen and Baker 1994; Rosen et al. 1998; Glasberg and Moore 2000).

EXPERIMENT: AUDITORY FILTER SHAPES AT LOW LEVELS IN FORWARD AND SIMULTANEOUS MASKING

Stimuli

Thresholds were measured for a 10-ms signal (5-ms raised-cosine ramps; no steady state) in the presence of 400-ms bands of noise, also gated with 5-ms raised-cosine ramps. In the forward-masking conditions, the silent interval or gap between the masker and the signal was 5 ms (defined as the duration over which the envelope voltage was 0 V). In the simultaneous-masking conditions, the onset of the signal occurred 380 ms after the masker onset. Thresholds were measured for signal frequencies (f s) of 1, 2, 4, 6, and 8 kHz. The masker consisted of two bands of Gaussian noise centered below and above the signal frequency, each with a bandwidth of 0.25f s. The noises were generated in the spectral domain and were bandlimited by setting all spectral components outside the desired passband to zero. In that way, the slope of the filtering was limited only by the 5-ms onset and offset ramps of the masker. The spectral notch width between the two noises provided the independent variable at each signal frequency. The notch was defined as the normalized deviation of the closer edge of each noise, Δf, from the signal frequency, i.e., Δf/f s. There were five conditions in which the notch was placed symmetrically about the signal: values of Δf/f s were 0 (no spectral notch), 0.1, 0.2, 0.3, and 0.4. Two asymmetric conditions were also tested, where the upper and lower normalized deviations were 0.2 and 0.4 and vice versa. This provided a total of seven conditions at each signal frequency.

All stimuli were generated digitally at a sampling rate of 32 kHz and were played out via a LynxStudio LynxOne soundcard at 16-bit resolution. In the forward-masking conditions, where the level difference between the signal and masker could be very large, the masker and signal were passed through different programmable attenuators (TDT PA4) before being mixed (TDT SM3) and passed through a headphone buffer (TDT HB6). In the simultaneous-masking conditions, where the level differences between the masker and signal were not as great, the masker and signal were added digitally before being passed through a single programmable attenuator and headphone buffer. The stimuli were presented monaurally in a double-walled sound-attenuating booth via Etymotic Research ER2 insert earphones, which are designed to provide a flat frequency response at the eardrum up to about 14 kHz.

Procedure

Initially, thresholds in quiet were measured for the signals at all the test frequencies. This was done using a three-interval three-alternative forced-choice method with a two-down one-up adaptive procedure that tracks the 70.7%-correct point on the psychometric function. Intervals were marked on a virtual response box on a flat-panel monitor located in the booth, responses were made via the computer keyboard or mouse, and feedback was provided after each trial. A run was terminated after 10 reversals. The step size was 8 dB for the first two reversals, 4 dB for the following two, and 2 dB thereafter. Threshold was defined as the mean signal level at the last 6 reversals. For each subject and frequency, a mean threshold was calculated from at least three repetitions of each condition.

The absolute threshold values for each listener and frequency were then used to determine the individual fixed signal level in the masking experiments. Eight subjects were tested in the forward-masking conditions. The signal was presented 10 dB above its threshold in quiet (referred to here as 10 dB SL) and the masker level was adaptively varied. One group of four subjects (who were tested earlier) were also tested in simultaneous masking, with the signal presented at 10 dB SL. The other four subjects (who participated later) were tested in simultaneous masking with the signal presented at 35 dB SL.

The presentation and tracking procedure in the masking experiments was the same as for the absolute thresholds, except that the masker level increased after two consecutive correct responses and decreased after each incorrect response. Also, a run was terminated after 12 reversals and the mean masker spectrum level at the last 8 reversals was defined as the threshold. Any run where the standard deviation (SD) of the reversal was greater than 4 dB was discarded.

After three successful runs (SD < 4 dB) of a given condition with a given subject were completed, the mean and SD across runs were calculated. If the SD across the three runs exceeded 4 dB, another run was undertaken. The mean and SD of the last three runs was calculated. If the SD still exceeded 4 dB, this procedure was repeated until a total of six runs had been undertaken. In the very rare event of the SD of the last three runs still exceeding 4 dB, data from all six runs were combined and the mean and SD across all six runs are reported.

Subjects

Eight normal-hearing listeners (four females, four males), aged between 18 and 29, participated in these experiments. All had absolute thresholds at octave frequencies between 250 and 8000 Hz of 15 dB HL or lower. All were students or recent graduates in the Boston area and were paid for their participation. Listeners were given about 2 hours of practice, after which their performance appeared stable. The experimental protocol was approved by MIT’s committee on the use of human experimental subjects and written informed consent was obtained from all listeners.

Results

The pattern of results was similar across listeners and so only the mean results are shown. Mean thresholds in quiet (and between-subject SDs in parentheses) for the 10-ms signal presented over ER2 insert earphones were 25.0 (2.9), 26.5 (4.0), 26.3 (5.3), 29.0 (2.6), and 31.7 (3.1) dB SPL at 1, 2, 4, 6, and 8 kHz, respectively. Within-subject SDs were typically less than 2 dB. Thresholds in forward masking were measured in all listeners with the signal presented 10 dB above the individual listener’s threshold in quiet. The mean results across the eight listeners are shown in Figure l. Error bars denote ±1 SD of the mean. As expected, the circles show that with increasing symmetric masker notch width, the masker level necessary to mask the signal increases. The left- and right-pointing triangles denote the two asymmetric notch conditions. If the auditory filters were symmetric, both conditions would produce the same threshold; larger differences suggest greater asymmetries. In Figure 1, the fact that the left-pointing triangles are consistently higher than the right-pointing triangles suggests that the auditory filters are asymmetric, with a steeper high-frequency slope.

Figure 1
figure 1

Mean data from the forward-masking condition. Masker level at threshold is plotted as a function of notch width. Circles denote conditions with symmetrically placed spectral notches; triangles denote asymmetric conditions where the lower and upper edges of the notch are 0.2f s and 0.4f s Hz (right-pointing triangles) or 0.4f s and 0.2f s Hz (left-pointing triangles) from the signal frequency, respectively. Error bars denote ±1 SD of the mean across the eight listeners. The signal was fixed at 10 dB above the threshold in quiet for each listener individually.

Figure 2 shows the mean results from the subset of four listeners who were tested in simultaneous masking with a 10-dB SL signal (mean absolute thresholds in this subset were 28.8, 28.9, 26.9, 28, and 30 dB SPL at 1, 2, 4, 6, and 8 kHz, respectively). A cursory comparison of Figures 1 and 2 suggests a difference in frequency selectivity, depending on whether the signal was presented simultaneously (Fig. 2) or nonsimultaneously (Fig. 1) with the masker. First, the increase in masker level with increasing notch width appears shallower in the simultaneous-masking case, suggesting poorer effective frequency selectivity. Second, the asymmetry of masking (indicated by the relative heights of the triangles) in the simultaneous-masking case is reversed compared with the nonsimultaneous case for all frequencies. It should be noted that all references to filter asymmetry are based on only two data points in each condition. Nevertheless, the trends discussed here were consistent across listeners.

Figure 2
figure 2

Mean data from the simultaneous-masking condition, with the signal level fixed at 10 dB above listeners’ thresholds in quiet. The symbols represent the mean of the four listeners who participated in this condition. Other aspects of the figure are as in Figure 1.

Figure 3 shows the mean results from the subset of four listeners who were tested in simultaneous masking with a 35-dB SL signal (mean absolute thresholds in this subset of listeners were 23.3, 24.1, 25.8, 30, and 33.3 dB SPL at 1, 2, 4, 6, and 8 kHz, respectively). The slope of the masking function seems similar to that found for simultaneous masking at the lower signal level (Fig. 2) and hence shallower than in the forward-masking case (Fig. 1). However, the asymmetry of masking in Figure 3 is consistent with that found in the forward-masking data, not with that found in the simultaneous-masking data at the lower level (Fig. 2). These issues are discussed in more detail in the following section, within the context of the auditory-filter model.

Figure 3
figure 3

Mean data from the simultaneous-masking condition, with the signal level fixed at 35 dB above listeners’ thresholds in quiet. The symbols represent the mean of the four listeners who participated in this condition. Other aspects of the figure are as in Figure 1.

DERIVING AUDITORY FILTERS FROM THE DATA

Methods

Both the individual and mean data were used to derive auditory filter shapes. Briefly, two versions of the roex(p,w,t) function (Glasberg et al. 1984) were used in conjunction with the middle ear function, described by Moore et al. (1997). These choices and their rationale are described in more detail below.

Outer and middle ear transfer functions.

The insert earphones used in this study (Etymotic Research ER2) are designed to produce a flat transfer function at the eardrum for frequencies up to about 14 kHz. For this reason no filtering was used within the model to simulate outer ear filtering. Two approaches were used in simulating the middle ear transfer function. The first was to ignore middle ear filtering completely. This approach is probably the most appropriate when comparing the results to physiological tuning curves, as most physiological studies of cochlear tuning have not taken middle ear filtering effects into account. The second approach was to use the function described by Moore et al. (1997), based in part on the middle ear functions measured by Puria et al. (1997), which is similar to that used in earlier modeling studies (Glasberg and Moore 1990) and which has also been used in subsequent studies (Glasberg and Moore 2000). Because of its previous use, this approach is preferable when comparing our data with those of earlier notched-noise studies of behavioral frequency selectivity. In practice, the inclusion of the middle ear transfer function had no systematic effect on the estimated ERBs of our data (although it often affected the estimated asymmetry of the filters), so that the overall conclusions were not affected by our choice. For simplicity, and to facilitate comparisons with earlier behavioral studies, the analyses and parameters discussed in this article refer only to those conditions in which the middle ear transfer function was included.

Filter shapes.

Filter shapes were derived using methods similar to those described in many previous studies (e.g., Patterson and Nimmo–Smith 1980; Glasberg and Moore 1990, 2000). The basic assumed filter shape was the rounded exponential (roex) filter. Both the roex(p,r) (Patterson et al. 1982) and the roex(p,w,t) (Glasberg et al. 1984) were tested. In addition, a variant of the roex(p,w,t), as recently used by Rosen et al. (1998) and Glasberg and Moore (2000), was also tested. These three filter shapes are described in turn. The equation for one side of the roex(p,r) filter is

where W is the filter weighting function, g is the deviation from the center frequency as a proportion of the center frequency, p is the parameter determining the slope of the filter, and r is the parameter defining the dynamic range of the filter. As in previous studies, p was allowed to differ on either side of the filter (p u for the upper side and p l for the lower side), whereas the value of r was assumed to be the same on both sides of the filter. The equation for the roex(p,w,t) filter is

The difference between Eq. (2) and (3) is that Eq. (3) has two slopes instead of a dynamic range limiter. The parameter t determines the factor by which the second slope is shallower than the first; the parameter w determines the relative weights of the first and second slope, or the point on the filter function at which the second slope begins to dominate. Both t and w are assumed to be the same on both sides of the filter. Thus, the roex(p,w,t) function has one more free parameter than the roex(p,r) function. The variant of the roex(p,w,t) filter has a lower side as described by Eq. (3). Its upper side, however, is described by simply one slope:

This version has the same number of parameters as the original roex(p,w,t) function, with the only difference being that the w and t parameters are not used for the upper side. This version is referred to as the roex(p,w,t,p) filter.

All three filter forms were tested with both individual and mean data. In almost all cases, the roex(p,r) fits were worse than the roex(p,w,t) fits, and the overall root mean squared (rms) deviations of the model predictions from the individual data were higher by about a factor of 2, which is more than expected given simply the difference in the number of free parameters.

Qualitatively, this can be appreciated in the data by noting that masker thresholds at larger notch widths continue to increase, but at a slower rate than at the smaller notch widths. In contrast, the roex(p,r) model requires the thresholds to increase initially at a roughly constant rate and then to remain constant. Because of the poorer fits of the roex(p,r) model, both qualitatively and quantitatively, only the roex(p,w,t) fits are discussed further in this article. As shown below, the roex(p,w,t) and roex(p,w,t,p) models provided comparably good fits to the data and resulted in similar estimates of filter bandwidth.

Fitting procedure.

A multidimensional nonlinear minimization routine [Nelder–Mead, as implemented in Matlab (Mathworks, Natick, MA)] was used to find the best-fitting parameters of the filters in the least-squares sense. It was assumed that the signal was detected by the filter with the best signal-to-noise ratio (SNR). In most situations this was also the filter centered at the signal frequency. However, in some cases the filter was centered somewhat away from the signal that had the best SNR, although its CF was always within 10% of the signal frequency. The “efficiency” of the detector, K, is the threshold SNR at the output of the detection filter across all conditions. The squared deviation of the predicted masker levels at threshold (in dB) with a constant K from the actual thresholds was used to drive the minimization routine.

Model predictions and filter shapes

Auditory filter shapes in forward masking.

Data from the forward-masking conditions were collected from all eight listeners, and both the mean and individual data were used to produce auditory filter shapes with both the roex(p,w,t) and roex(p,w,t,p) filters. Both functions provided good descriptions of the data. As they each resulted in very similar overall rms errors for the individual data of around 0.9 dB, it was not possible to distinguish between them based on goodness of fit. The predicted thresholds are not shown in Figure 1, as the deviations between the predicted thresholds and the actual thresholds were always so small as to be essentially indistinguishable on the scale of the figure.

The ERB provides a convenient measure of filter tuning, although other measures, such as the 3-dB or 10-dB bandwidth, may also be used. All three measures were tried and all resulted in the same basic pattern of results and conclusions. For ease of comparison, only the ERBs are discussed here. Figure 4 shows four estimates of the ERB as a function of signal frequency. The symbols connected with solid lines denote the mean ERB values derived from the individual data, and the symbols connected with dashed lines denote the ERB values derived from fitting the function to the mean data. Squares and diamonds denote estimates using the roex(p,w,t) and roex(p,w,t,p) filters, respectively. Error bars for the symbols connected with solid lines, representing ±1 standard error (SE) of the arithmetic mean, are shown where they exceed the size of the symbol. The heavy curve with no symbols shows the ERB function as predicted by the equation given in Glasberg and Moore (1990), referred to here as the GM90 function.

Figure 4
figure 4

Equivalent rectangular bandwidths (ERBs) of the filters derived from forward masking as a function of filter center frequency. Squares represent estimates using the roex(p,w,t) model and diamonds represent estimates using the roex(p,w,t, p) model; see text for details. Symbols connected with solid lines represent mean ERB estimates from the individual data, whereas symbols connected with dashed lines represent ERB estimates from the mean data. Error bars are shown only for symbols connected with solid lines and denote 1 SE of the mean estimate. The heavy solid curve shows the predicted ERBs of the function proposed by Glasberg and Moore (1990).

The close correspondence between the mean ERBs (solid lines) and the ERBs of the mean data (dashed lines) provides some evidence for the validity and reliability of the fitting procedure. In general, there are no consistent differences in ERBs between the two filter functions [squares for roex(p,w,t); diamonds for roex(p,w,t,p)]; there is a tendency for the roex(p,w,t,p) model to produce somewhat larger ERB values at 4 and 6 kHz, but the differences are never greater than 20%. An obvious aspect of Figure 4 is the large difference between the ERBs from the present data and the GM90 function. At 1 kHz, the GM90 ERB is about a factor of 1.3 larger; at 8 kHz the difference is a factor of about 2.1. These differences in tuning are illustrated further in Figure 5. Here, the ERB (in Hz) has been replaced by a dimensionless value, denoted Q ERB. This is simply the filter center frequency divided by the ERB; a large Q ERB value implies sharp frequency tuning. In this graph, the geometric mean Q ERBs from the individual data fits are shown, together with ±1 SE of the geometric mean. As in Figure 4, squares and diamonds represent fits using the roex(p,w,t) and roex(p,w,t,p) filters, respectively. In contrast to the tuning of the GM90 function, which stays reasonably constant over the frequency range of interest, the new data suggest tuning that increases substantially from a Q ERB value of about 10 at 1 kHz to a value of nearly 20 at 8 kHz.

Figure 5
figure 5

Geometric mean values of Q ERB from the roex(p,w,t) (squares) and roex(p,w,t,p) (diamonds) models. Error bars denote 1 SE of the mean. The Q ERB is a dimensionless measure of tuning, defined as the center frequency divided by the ERB. The heavy solid curve shows the predicted Q ERB, as defined by Glasberg and Moore (1990). The light solid line and the dashed lines represent the best-fitting power function to the pooled estimates and the 95% confidence intervals, respectively.

Shera and Guinan (2003) found that a power law well described the changes in QERB in cat and guinea-pig auditory-nerve tuning data as a function of CF. A power law could also describe human tuning data derived indirectly from otoacoustic emissions data (Shera et al. 2002). The same type of fit was used here. The equation used is

where F is the CF in kHz and α and β are free parameters. Given that there was no compelling reason to select one filter type over the other, individual estimates from both the roex(p,w,t) and the roex(p,w,t,p) models were pooled. The solid and dashed lines in Figure 5 represent the best-fitting power-law function and the 95% confidence intervals, respectively. The parameters used are α = 0.27 ± 0.06 and β = 11.1 ± 0.9. On the log coordinates used in Figure 5, the function is a straight line, with α determining the slope and β determining the intercept or vertical position of the line. If sharpness of tuning did not change as a function of frequency, α would be close to zero.

The filter shapes derived from the mean data for both the roex(p,w,t) model (solid lines) and the roex(p,w,t,p) model (dotted lines) are shown in Figure 6. Again, the increase in the sharpness of tuning with increasing CF is evident. The filter parameters and ERBs from the mean data are given in Table 1. Both Figure 6 and Table 1 indicate that the lack of a second slope on the upper side generally made the roex(p,w,t,p) more symmetric than the roex(p,w,t) around the peak (the values of p u and p l are more similar). No other systematic differences are apparent.

Table 1 Model parameters derived from the mean forward-masking dataa
Figure 6
figure 6

Auditory filter shapes with center frequencies between 1 and 8 kHz, derived from the mean data in the forward-masking condition. Solid and dotted curves denote roex(p,w,t) and roex(p,w,t,p) filters, respectively.

Auditory filter shapes in simultaneous masking.

As with the forward-masking data, both the roex(p,w,t) and the roex(p,w,t,p) functions provided good fits to the simultaneous-masking data, with rms errors, pooled across all subjects, of about 0.9 and 0.6 dB for the 10-dB SL and 35-dB SL conditions, respectively. The mean Q ERB values from the roex(p,w,t,p) model for both the 10-and 35-dB SL conditions are shown in Figure 7 as downward- and upward-pointing triangles, respectively. Q ERB values from the roex(p,w,t) model (not shown) were very similar. For comparison, the Q ERBs from the forward-masking condition (diamonds), also with the roex(p,w,t,p) model, together with the predictions of the GM90 function (solid curve), are replotted from Figure 5. As expected, the Q ERB values derived from the simultaneous masking are smaller than those derived from the forward masking, indicating broader tuning. The tuning still appears to be marginally sharper than that predicted by the GM90 function, a difference which seems to increase with increasing frequency. This may well be due to the relatively low masker levels used in the present study. It is known that filter shapes broaden at higher levels, especially at higher frequencies, and the GM90 function is specifically designed to predict tuning in the presence of a masker with a constant spectrum level of around 30 dB SPL. Even in our 35-dB SL signal condition, the masker spectrum level in the no-notch condition was generally around 10 dB lower than that.

Figure 7
figure 7

Geometric mean Q ERB values in simultaneous masking with the signal level fixed at 10 dB (down-pointing triangles) or 35 dB (up-pointing triangles). Error bars denote 1 SE of the mean across the four listeners in each condition. Values of roex(p,w,t,p) Q ERBs from the forward-masking condition (diamonds) and the Glasberg and Moore predictions (solid curve) are replotted from Figure 5 for comparison.

At first glance it appears that the filters from the 35-dB SL condition may be somewhat sharper than those from the 10-dB SL condition, despite considerable overlap of the error bars. Closer inspection of the individual data, however, reveals that the apparent difference is probably due to individual differences in frequency selectivity: Recall that the two simultaneous-masking conditions were completed by two different groups of listeners. In fact, mean forward-masking ERBs in listeners who completed the 10-dB simultaneous-masking condition were somewhat broader than those for listeners in the other group, although not significantly so (t-test, p > 0.05 at all CFs). This point is illustrated in Figure 8, which shows the individual ratios of the ERBs for the simultaneous masker to the ERBs for the forward masker, using the roex(p,w,t,p) model in all cases. Filled and open symbols represent individual listeners in the 10-dB and 35-dB SL simultaneous-masking conditions, respectively. The geometric mean ratios for the 10-dB and 35-dB conditions are shown by the solid and dashed lines, respectively. It can be seen that there is no systematic difference between the ratios from the two conditions. This in turn suggests that the ERB, as measured here with simultaneous masking, does not change much with level at very low levels. This is consistent with the idea that the input–output function of the basilar membrane is approximately linear for the first 20–40 dB above threshold, as suggested by both physiological (Ruggero et al. 1997) and psychophysical studies (Oxenham and Plack 1997; Plack and Oxenham 1998). Of course, our measure of tuning, the ERB or Q ERB, depends more on the shape of the filter tip than that of the tail. This makes the measure relatively insensitive to certain changes in filter shape, such as a change in the tip-to-tail ratio (Rosen et al. 1998), which is closely related to the filter parameter w. Inspection of this parameter across the two simultaneous-masking conditions did not reveal a consistent trend, although this too may be due to the across- rather than within-subject nature of the comparison. The lack of an effect of level on the parameter w can be observed in the fits to the mean data, where the best-fitting parameters for the 10-dB and 35-dB simultaneous-masking conditions are shown in Tables 2 and 3, respectively.

Figure 8
figure 8

Individual ratios of simultaneous-masking ERBs to forward-masking ERBs. Filled symbols denote listeners who participated in the 10-dB SL simultaneous-masking condition; open symbols denote listeners who participated in the 35-dB SL simultaneous-masking conditions. Geometric mean ratios for the 10-dB and 35-dB SL conditions are shown by the solid and dashed lines, respectively. All estimates are from the roex(p,w,t,p) model.

One change that does seem to occur with level in the simultaneous-masking conditions relates to the asymmetry of the auditory filter. The 10-dB simultaneous-masking condition (Table 2) consistently yielded asymmetries in the opposite direction to those found for the forward-masking conditions. The asymmetries for the 35-dB simultaneous-masking conditions were less marked and, at least for the roex(p,w,t) model, were in the same direction as for the forward-masking conditions.

Table 2 Filter parameters derived from the mean simultaneous-masking data, with the signal presented at 10 dB SLa

Values of K in forward and simultaneous masking.

Within the power spectrum model, the value of K represents the detector efficiency, or the threshold signal-to-noise ratio at the filter output. While its meaning is clear for long-duration signals embedded in long-duration noise, the definition becomes problematic for short-duration signals and becomes even more so in the case of forward masking. The values of K quoted in the tables are simply the raw signal-to-noise ratios at threshold, without regard to differences in duration or temporal position. In general, there is a trend for values of K to decrease with increasing CF, which is more pronounced in the forward-masking condition than in the simultaneous-masking conditions. However, to be able to compare values of K across condition and studies, the effects of temporal integration would have to be accounted for, as would the decay of forward masking. Thus, without making further assumptions about the nature of processing, it is possible to make only qualitative observations about trends in K within a given condition.

DISCUSSION

The purpose of this study was to reexamine notched-noise measures of frequency selectivity by using a behavioral technique that matched physiological measures of tuning as closely as possible. This was achieved by using forward masking to avoid suppressive interactions between the masker and signal and by presenting the signal at a fixed low level (10 dB above threshold in quiet). The resulting auditory filters show substantially narrower tuning than those derived from earlier studies employing simultaneous masking, often at higher levels. Certain aspects and consequences of the findings are discussed below.

Frequency selectivity as a function of CF

Although it has been known for some time that forward masking produces sharper tuning than does simultaneous masking, the differences have not before been studied systematically as a function of frequency. Perhaps the most important finding of this study is that frequency tuning increases substantially with CF between 1 and 8 kHz. Because of this, the difference between the frequency selectivity implied by the GM90 equation and that found here increases with increasing frequency, as shown in Figures 5 and 6.

Our estimated ERB of about 100 Hz at a signal frequency of 1 kHz is in excellent agreement with the only other study of notched-noise forward masking at low masker levels (Glasberg and Moore 1982); as mentioned in the Introduction, they also found the ERB at 1 kHz to be about 100 Hz. To our knowledge, no other notched-noise studies have examined frequency selectivity in forward masking at frequencies other than 1 kHz. One study involving psychophysical tuning curves, with a low-level notched noise added to avoid possible confounding factors such as off-frequency listening and “confusion” effects (e.g., Neff 1986), measured tuning at 0.5, 1, 2, and 4 kHz in two listeners (Moore et al. 1984). Unfortunately, although the bandwidth estimates at 1 kHz were similar to ours, at 4 kHz the estimates of tuning in the two listeners differed from each other by a factor of 2 (3-dB bandwidths of 210 and 440 Hz), making it difficult to draw conclusions regarding changes in bandwidth with frequency.

Although our forward-masking data show generally sharper tuning than our simultaneous-masking data, all three conditions (forward masking, simultaneous masking at 10 dB SL, and simultaneous masking at 35 dB SL) show an increase in tuning with increasing CF. This can be seen directly in Figure 7 and indirectly in Figure 8, in that there seems to be no consistent trend for the ratio of ERBs between forward and simultaneous masking to change as a function of CF. Thus, the difference between our data and those of previous studies cannot be ascribed wholly to the difference between simultaneous and nonsimultaneous masking and the resultant effects of suppression. The difference may also lie in our levels of signal presentation, which were generally lower than in previous studies, and in the fact that the signal level was fixed and the masker level was varied. If changes in filter bandwidth with level are more pronounced at high than at low CFs (e.g., Hicks and Bacon 1999), then changes in tuning with CF should be more pronounced at low levels.

Filter asymmetry in forward and simultaneous masking

While the estimated ERBs and the goodness-of-fit of the roex(p,w,t) and roex(p,w,t,p) models were very similar, one difference involved the estimated amount of asymmetry at the tip of the filter. In general, roex(p,w,t) fits produced lower values of the ratio of asymmetry, p u/p l, indicating a steeper upper filter slope relative to the lower filter slope. This can be understood in terms of the additional, shallower slope on the upper side of the roex(p,w,t) function.

Glasberg and Moore (2000) concluded that a function equivalent to the roex(p,w,t,p) function used here could produce good fits to the data if the asymmetry ratio was fixed at unity, i.e., if the tip of the filter was assumed to be symmetric. An examination of the functions derived from the forward-masking (Table 1) and the 35-dB simultaneous-masking (Table 3) conditions suggests that the same may be true of our data: The values of pu and pl do not seem to deviate systematically from one another, except at 8 kHz in the forward-masking condition. Thus, the filters are somewhat asymmetric, with steeper upper than lower slopes; this asymmetry can be expressed in terms of either a filter with an asymmetric tip region, as for the roex(p,w,t), or a filter with a symmetric tip region but an asymmetric tail region, as for the roex(p,w,t,p).

Table 3 Filter parameters derived from the mean simultaneous-masking data, with the signal presented at 35 dB SLa

The situation is different in the 10-dB simultaneous-masking condition. Here, both filter functions suggest an asymmetry in the opposite direction: The upper slope appears to be shallower than the lower slope. Similar results have been found before in simultaneous masking (e.g., Shailer et al. 1990), and the reversal of asymmetry at low levels is included in Glasberg and Moore’s (1990) analysis of auditory filter shapes [their Eq. (5)]. The fact that this asymmetry reversal is found only in simultaneous masking suggests that it too may be related to suppression effects. Both physiological and psychophysical studies of suppression have shown that high-side suppression (where the suppressor is above the signal in frequency) occurs at lower sound pressure levels than low-side suppression (e.g., Duifhuis 1980; Delgutte 1990). If suppression broadens the apparent tuning, then it should affect the upper slope of the filter at lower levels than it affects the lower slope of the filter, leading to the apparent asymmetry reversal. In conclusion, the asymmetry reversal at low levels is probably a consequence of suppression and does not reflect cochlear tuning per se.

Comparisons with physiological data from humans and animals

The present work was motivated by a recent study of stimulus-frequency otoacoustic emissions (SFOAEs) in humans, cats, and guinea pigs (Shera and Guinan 2003). This study found that the frequency dependence of SFOAE group delays correlates well with changes across CF in the bandwidths of auditory-nerve tuning curves in cat and guinea pig. Shera et al. (2002) showed that although SFOAE-based estimates of human cochlear tuning do not match well with the GM90 function, the otoacoustic estimates are in very good agreement with the filters derived from the present data. This provides independent physiological support for the idea that the present behavioral data in forward masking reflect tuning at the level of the cochlea.

The study of Shera et al. (2002) is one of very few studies to directly compare physiological with behavioral measures of frequency selectivity. Similar work has been done in the guinea pig by Evans and colleagues (Evans et al. 1992; Evans 2001). They found good correspondence between auditory-nerve tuning curves and behavioral measures using simultaneous masking. This good correspondence may seem at odds with our assertion that the suppression effects in simultaneous masking can severely underestimate cochlear tuning. The apparent discrepancy may be resolved by considering that guinea-pig cochlear tuning is generally poorer than that found in humans by a factor of 2 or 3 (Shera et al. 2002). Furthermore, the amount of suppression found in guinea-pig auditory-nerve fibers is often rather small (e.g., Prijs 1989). Thus, differences between forward and simultaneous masking may not be as apparent in that species as they are in humans.

Implications of the revised tuning estimates

The data and analysis presented here suggest that the GM90 function does not provide a good estimate of human cochlear tuning at low levels. A comparison of the GM90 function and, for instance, auditory-nerve tuning curves is therefore not appropriate. Instead, it seems that when the effects of suppression are eliminated by using forward masking, and when low signal levels are used, human cochlear tuning is considerably sharper than previously thought. While it has been known for many years that forward masking produces sharper estimates of tuning than does simultaneous masking, the present results provide the first evidence that tuning, as measured by the Q ERB, becomes markedly sharper with increasing frequency between 1 and 8 kHz. Although this is a new finding, it brings the pattern of human cochlear tuning in line with that found in other mammals, such as cat and guinea pig, where tuning also increases substantially with increasing CF (see Shera et al. 2002).

There are a number of situations in which tuning estimates derived from simultaneous masking, such as the GM90 function, remain valid and more appropriate than those from the present study. For instance, in situations where predictions of audibility are required for one high-level stimulus embedded in another, it is of interest whether the stimulus is masked, not whether the masking is due to suppression or excitation, or whether the cochlea is tuned more sharply at low levels. Thus, in applications such as low bit-rate coding of digital audio, where the coding noise must fall below masked or absolute threshold, the current procedures (and the GM90 function) remain valid under most everyday situations.

There are, however, a number of situations in which the use of the estimates from the present study would be more appropriate. The most obvious example involves models of human cochlear processing. The filter shapes derived from the present data provide a basis for defining the basic frequency-tuning properties of such models, just as neural tuning curve data do for cochlear models in animals. In order to account for the differences between simultaneous and forward masking, such models will probably also need to incorporate realistic implementations of suppression mechanisms. Further studies with forward masking, using either higher signal levels or longer gaps between the masker and signal, will be required to define how tuning changes with stimulus level. It is expected that filters will broaden considerably, given previous data using forward-masking psychophysical tuning curves at high levels (e.g., Nelson 1991). The technique used here is difficult to use at frequencies below 1 kHz, as the short duration of the signal results in a bandwidth exceeding the bandwidth of the auditory filter. However, it is possible that the problem can be overcome using the pulsation-threshold method (Houtgast 1972), which has been used recently by Plack and Oxenham (2000) to provide a behavioral estimate of basilar-membrane compression at low frequencies.

Many models of auditory perception include auditory filtering as the first stage. In most cases, it is assumed that the masking patterns found in simultaneous masking reflect the excitation pattern along the basilar membrane (Zwicker 1960; Florentine and Buus 1981; Moore et al. 1997). Thus, the frequency selectivity assumed in such models is similar to the functions derived from simultaneous-masking data, such as the GM90 function. The present data support earlier assertions (Moore and Vickers 1997; Oxenham and Plack 1998) that the frequency selectivity assumed by such models is incorrect and will require revision.

Finally, the method developed here may be of use in testing the frequency selectivity of listeners with mild-to-moderate hearing loss. In previous studies of hearing-impaired listeners, it has been found that frequency selectivity is generally not greatly affected for hearing losses of 35 dB or less (e.g., Moore et al. 1999). However, it is possible that the apparent insensitivity of frequency selectivity to mild hearing loss may be due to the measurement technique used in previous studies. If suppression in the normal cochlea results in an underestimate of frequency tuning, then a reduction in suppression, concomitant with cochlear hearing loss, may compensate for the decrease in tuning that would otherwise be observed. Recently, Baker and Rosen (2002) showed, using simultaneous masking, that the dependence of frequency selectivity on level could be affected by hearing losses as small as 20 dB. It may be that forward masking will prove to be a more sensitive measure of the effects of hearing loss on frequency selectivity, such that changes in tuning are more readily apparent even for mild hearing losses. If frequency selectivity is determined primarily by outer hair cell function, such a measure may help determine the different contributions of various physiological mechanisms to an individual hearing loss.

SUMMARY

The masker levels necessary to mask a fixed low-level signal between 1 and 8 kHz were measured for eight listeners in forward and simultaneous masking with a varying notched-noise masker. The data were used to derive auditory filter shapes. The following conclusions were drawn:

  • The derived tuning from the forward-masking conditions was considerably sharper than has been found in previous studies using simultaneous masking.

  • In forward masking, the dependence of tuning on CF was different from that found in earlier studies using higher-level simultaneous maskers (e.g., Glasberg and Moore 1990). Instead of near-constant relative bandwidth, the sharpness of tuning doubled over the range of frequencies tested, with Q ERB values increasing from around 10 at 1 kHz to around 20 at 8 kHz. The revised estimates of human cochlear tuning as a function of CF between 1 and 8 kHz can be summarized as Q ERB = 11F 0.27, where F is the filter CF in kHz.

  • In simultaneous masking, tuning was broader than that found in forward masking, although the dependence of tuning on CF was similar. The difference between these results and previous results using simultaneous masking may be due to our use of a low fixed-level signal.

  • The results may be of use in constructing models of the human cochlea and should facilitate comparisons of cochlear tuning between humans and other animals.