Keywords

1 Introduction

Interaural time and level differences (ITDs and ILDs) are the major cues for horizontal sound localization. Sensitivities to ITDs and ILDs, evaluated by lateralization tasks, often vary markedly among listeners. Lateralization performance based on ITDs and ILDs should reflect not only the listener’s ability to compare time and level information, respectively, between ears but also the efficiency of encoding information about the temporal structure and intensity of stimuli at stages below binaural interactions in auditory processing. Our earlier study attempted to evaluate the relative contributions of these processing stages to the inter-listener variability in lateralization performance, by comparing individual listeners’ monaural sensitivities to the temporal structure and intensity of a sound stimulus with their ITD and ILD sensitivities (Ochi et al. 2014). The results showed significant correlation of ILD discrimination thresholds with thresholds for monaural level-increment detection task. This could be interpreted as indicating that the inter-individual differences in ILD sensitivity could be (partially) accounted for by the level coding efficiency at stages before binaural interaction. Similarly, ITD discrimination thresholds were found to correlate with the listeners’ sensitivities to the temporal structure of monaural stimuli, when the stimuli were in high frequency range (around 4000 Hz). However, we failed to find a positive correlation for stimuli in low-frequency range (around 1100 Hz).

The present study extends our earlier study (Ochi et al. 2014) by incorporating early-elderly listeners under essentially the same experimental settings. We adopted early-elderly listeners because generally they would exhibit deteriorated sensitivities to temporal structures of stimuli, while their audiometric thresholds remain within a normal to mildely-impaired range. We first examined the effects of age on the performance of individual tasks. We then analysed correlations, as in the earlier study, between task performances. It has been reported that sensitivities to the temporal structure and intensity of stimuli decline with age (e.g., Hopkins and Moore 2011). A population including young and elderly listeners would therefore exhibit a large variability of thresholds in the monaural tasks, which would lead to improved sensitivity of the correlation analyses and provide further insights as to the roles of monaural processing in ITD or ILD discrimination. Supplemental data were also obtained to evaluate underlying mechanisms for the monaural tasks.

As in the earlier study, we measured listeners’ performances in four basic tasks, namely ITD, ILD, TIME, and LEVEL tasks, which would reflect sensitivities to ITDs, ILDs, the temporal structure, and the level change of stimuli, respectively. Low- and high-frequency stimuli were tested, which were centred at around 1100 and 4000 Hz, respectively. Supplementary experiments measured frequency resolution (FRES task), frequency discrimination limens (FDISC task), and (for low-frequency stimulus only) the discrimination threshold of Huggins pitch (HUGGINS task; Cramer and Huggins 1958).

2 Methods

2.1 Listeners

Forty-three adults participated in the experiment. All gave written informed consent, which was approved by the Ethics Committee of NTT Communication Science Laboratories. Those included 22 normal-hearing young listeners (referred to as the YNH group; 10 males and 12 females; 19–43 years old, mean 32.0) and 21 elderly listeners (11 males and 10 females; 60–70 years old, mean 63.0). The data from the YNH listeners have been represented in the earlier study (Ochi et al. 2014). In the analyses, the elderly listeners were further divided into normal-hearing (referred to as ENH) and hearing-impaired (EHI) groups. Listeners with audiometric thresholds of < 30 dB HL at all the frequencies between 125 and 4000 Hz in both ears were classified as normal-hearing; otherwise, as (mildly) hearing-impaired. For the FDISC and HIGGINS tasks, a subset of YNH listeners (N = 12) and all the elderly listeners participated. Table 1 summarizes the means and standard deviations of hearing levels obtained by pure-tone audiometry.

Table 1 Mean and standard deviations of hearing levels for the three listener groups. Columns represent, from left to right, averages across all the frequencies, 1000-Hz tone, and 4000-Hz tone, respectively

2.2 Stimuli

Stimuli were presented to the listener through headphones. Except for the binaural tasks (i.e., ITD, ILD, and HUGGINS tasks), the stimuli were presented to the right ear.

The main four tasks (namely, ITD, ILD, TIME, and LEVEL tasks) employed two types of stimuli, referred to as the low- and high-frequency stimuli, which were identical to those used in our earlier study (Ochi et al. 2014) and are thus only described briefly here. The stimuli were designed to assess the listener’s ability to use information on the basis of neural phase-locking to the stimulus temporal structure, respectively, in the ITD and TIME tasks. Essentially the same stimuli were also used in the ILD and LEVEL tasks. The low-frequency stimulus was a spectrally shaped multicomponent complex (SSMC), which was a harmonic complex with a fundamental frequency (F 0) of 100 Hz. The spectral envelope had a flat passband and sloping edges (5×F 0 centered at 1100 Hz).The overall level of the complex was 54 dB SPL. Threshold equalizing noise, extending from 125 to 15,000 Hz, was added. The high-frequency stimulus was a ˵transposed stimulus,˶ which was a 4-kHz tone carrier amplitude-modulated with a half-wave rectified 125-Hz sinusoid (Bernstein and Trahiotis 2002). It is considered that the auditory-nerve firing is phase locked to the modulator waveform, which provides the cue for judging the ITD and modulation rate of the stimulus. The overall level of the transposed stimulus was set to 65 dB SPL. A continuous, low-pass filtered Gaussian noise was added to prevent the listener from using any information at low spectral frequencies (e.g., combination tones).

Stimuli used for supplementary tasks (namely, FRES and FDISC tasks) involved tone-burst signals at frequencies of 1100 and 4000 Hz. Specifically to these two tasks, the low- and high-frequency stimuli refer to the 1100- and 4000-Hz tones, respectively. The frequency band of interest in the HUGGINS task (another supplementary task) was centred at 1100 Hz. Other details about the stimuli for the FRES, FDISC, and HUGGINS tasks are described in the next subsection.

2.3 Procedures

2.3.1 General Procedure

A two-interval two-alternative forced-choice (2I-2AFC) method was used to measure the listener’s sensitivities to stimulus parameters. Feedback was given to indicate the correct answer after each response. The two-down/one-up adaptive tracking method was used to estimate discrimination thresholds.

2.3.2 Task Specific Procedures

2.3.2.1 ITD Task

In a 2I-2AFC trial, stimuli in the two intervals had ITDs of + ΔITD/2 and − ΔITD/2 μs. Each stimulus was 400-ms long, including 100-ms raised-cosine onset and offset ramps, which were synchronized between the two ears. The listeners were required to indicate the direction of the ITD change between the two intervals on the basis of the laterality of sound images.

2.3.2.2 ILD Task

Similarly to the ITD task, stimuli in the two intervals had ILDs of + ΔILD/2 and − ΔILD/2 dB. Each stimulus was 400-ms long, including 20-ms raised-cosine onset and offset ramps. The listeners were required to indicate the direction of the ILD change between the two intervals on the basis of the laterality of sound images.

2.3.2.3 TIME Task

For the low-frequency stimulus, the listeners were required to detect a common upward frequency shift (Δf Hz) imposed on the individual components of the SSMC with the spectral envelope remaining unchanged (Moore and Sek 2009). It was assumed that the listeners based their judgments on pitch changes, reflecting the temporal fine structure encoded as the pattern of neural phase locking. The ˵signal˶ and ˵non-signal˶ intervals in the 2I-2AFC method contained RSRS and RRRR sequences, respectively, where R indicates the original SSMC and S indicates a frequency-shifted SSMC. For the high-frequency stimulus, the listener’s task was to discriminate the modulation frequencies of the transposed stimuli between f m (=125 Hz) and f m + Δf Hz, referred to as R and S, respectively. Each R and S had a duration of 100 ms, including 20-ms raised-cosine ramps. The threshold was expressed as Δf/f 0 or Δf/f m for the low- or high-frequency stimuli, respectively. When adaptive tracking failed to converge within this limit, trials with a shift of 0.5F 0 were repeated 30 times. In that case, the proportion of correct trials was converted to dʹ, and then the ˵threshold˶ was derived on the assumption that dʹ is proportional to the frequency shift (Moore and Sek 2009).

2.3.2.4 LEVEL Task

In a 2I-2AFC trial, the listeners were required to indicate an interval containing a 400-ms-long SSMC or a transposed stimulus whose central 200-ms portion (including 20-ms raised-cosine ramps) was incremented in level by ΔL dB, while the other non-signal interval contained an original SSMC or a transposed stimulus.

2.3.2.5 FRES Task

The notched-noise masking method (Patterson et al. 1982) was employed to evaluate frequency selectivity. The signals were pure-tone busts centred at 1100 or 4000 Hz, and the maskers were notched noises with varying notch width (0, 0.05, 0.1, 0.2, 0.3, and 0.4 relative to the signal frequency). The spectrum level within the passband was 40 dB SPL. A rounded-exponential filter (Patterson et al. 1982) was fitted to the experimental data using a least-square fit. The equivalent rectangular bandwidth (ERB) was then derived from the parameters of the fitted filter.

2.3.2.6 FDISC Task

Frequency difference limens were measured with pure-tone bursts centred at 1100 and 4000 Hz. Similarly to the TIME task, the sequence of RRRR and RSRS was presented, R and S representing tone bursts with frequencies of fc and f c + Δf Hz, respectively (f c = 1100 or 4000 Hz). Each tone burst had 200 ms of duration with 20 ms onset-offset ramps.

2.3.2.7 HUGGINS Task

Bandpass-filtered noises (passband: 250–4000 Hz) with a duration of 200 ms were used as stimuli. The noise was diotic except for a narrow frequency band centred around 1100 Hz (f c) with an 18 % width around the centre frequency, on which an interaural phase transition was imposed. This stimulus elicits a sensation of pitch corresponding to fc (Cramer and Huggins 1958). Similarly to the TIME task, the sequences of RRRR and RSRS were presented, with R and S representing tone bursts with frequencies of fc and f c + Δf Hz, respectively. The discrimination threshold was expressed as Δf/f c.

2.4 Data Analyses

MATLAB with Statistical Toolbox was used for statistical analyses of the data. For the ITD, TIME, LEVEL, FDISC, and HUGGINS tasks, the analyses were performed with log-transformed threshold data.

3 Results

The left four columns of panels of Fig. 1 compare the performance of the four basic tasks (ITD, ILD, TIME, and LEVEL) between listener groups. For the low-frequency stimulus (the upper row of panels in Fig. 1), a one-way analysis of variance indicated a statistically significant effect of listener group in the TIME task only. Subsequent pair-wise comparisons indicated higher thresholds for the ENH and EHI listeners than for the YNH listeners. For the high-frequency stimulus (the lower rows of panels), a significant effect of listener group was found in the ITD, TIME, and LEVEL tasks. These results reflect higher thresholds for EHI than for ENH (ITD task); for ENH and EHI than for YNH (TIME task); and for ENH than for YNH (LEVEL task). The listener-group effect was not significant for the ILD task for either stimulus type. The figure also shows the data obtained in the supplementary experiments (FRES, FDISC, and HUGGINS tasks). Significant listener-group effects were found in the HUGGINS task and the FDISC task (for the high-frequency stimulus only).

Fig. 1
figure 1

Comparisons of thresholds among listener groups. Each column of panels represents one task, as labelled. The three columns of data within a panel correspond to, from left to right, YNH, ENH, and EHI, respectively. The 10th, 25th, 50th, 75th, and 90th percentiles are shown by the box and whisker plot. The red circle shows the mean, and the grey dots represent individual listeners’ data. The task for which a significant effect of listener group was revealed by one-way analysis of variance is indicated by the p-values. Asterisks indicate group pairs for which a significant difference was indicated by a post hoc tast with Tukey’s honestly significant difference criterion (*p < 0.05; **p < 0.01; ***p < 0.001)

The thresholds of individual listeners are compared between pairs of tasks in Fig. 2. The results show statistically significant positive correlations for the ITD-ILD and ILD-LEVEL pairs. The partial correlation coefficients by controlling the effect of age (YNH versus ENH/EHI) were also significant. A significant partial correlation coefficient was also found for the ITD-TIME pair with the high-frequency stimulus.

Fig. 2
figure 2

Comparisons of individual listeners’ thresholds between tasks for a low—and b high- frequency stimuli. Significant correlations were marked by straight lines obtained by orthogonal regressions. See also Table 2

Table 2 Pearson’s correlation coefficients and p-values (in parentheses) for the data shown in Fig. 2. The second line of each entry indicates the values when the effect of age was partialled out. Significant correlations (p < 0.05) were marked as italic characters

4 Discussion

Elderly listeners (ENH and EHI) showed higher thresholds than young listeners (YNH) in the TIME task (with both low- and high-frequency stimuli) and the HUGGINS task. This confirms earlier findings that the sensitivity to temporal structure declines without accompanying elevated audiometric thresholds (e.g., Hopkins and Moore 2011). Thresholds for the LEVEL task also tended to be higher in the elderly than the young, consistently with previous finding (e.g., He et al. 1998). Despite these consistent declines in the performance of the monaural tasks with age, the present study failed to find significant age effect in the lateralization tasks (ITD and ILD).

The thresholds for the ILD and LEVEL tasks correlated for both the low- and high-frequency stimuli. This confirms the results of our earlier study (Ochi et al. 2014), suggesting that the efficiency of level coding in the auditory periphery is a major factor accounting for inter-individual variation of ILD sensitivity. The ITD task showed a correlation with the TIME task for the high-frequency stimulus; when the factors of age and hearing-impairments were controlled. This again is consistent with the finding of our earlier study (Ochi et al. 2014), suggesting that a listener’s ITD sensitivity is well accounted for by the listener’s sensitivity to temporal (envelope) structure.

Despite a relatively large number of participants and greater range of threshold values in the TIME task, however, we failed to find a correlation between the ITD and TIME tasks for the low-frequency stimulus. The thresholds for the HUGGINS task showed a significant positive correlation with those for the TIME task (r = 0.50, p = 0.004), but not with those for the ITD task (r = 0.11, p = 0.553). This suggests that the performances of the TIME and HUGGINS tasks capture inter-individual variation of the efficiency of temporal-structure processing, but that of the ITD task is determined primarily by other factors. The HUGGINS and ITD tasks are similar in that both require interaural comparison of temporal-structure information, but differ in the perceptual domain in which listeners are expected to respond (i.e., pitch versus laterality).

It should be noted, however, that the positive correlation found for the TIME and HUGGINS tasks was due predominantly to the consistent age effect in the both tasks (see Fig. 1). When the effects of age were partialled out, the correlation coefficient was not significantly different from zero (r = − 0.24, p = 0.191), implying that the positive correlation was due to underlying non-temporal factors that are sensitive to aging. A candidate for such a factor is frequency selectivity. It has been argued that (intermediately) resolved frequency components of the SSMC could contribute to the performance of the TIME task (Micheyl et al. 2010). Peripheral frequency resolution could influence the representation of interaural correlation across a frequency axis, which is the basis for the Huggins pitch. Indeed, ERB was significantly correlated with thresholds for the TIME and HUGGINS tasks (ERB-TIME: r = 0.30, p = 0.047; ERB-HUGGINS: r = 0.35, p = 0.050; the effects of age were not partialled out).