Speech Masking in Normal and Impaired Hearing: Interactions Between Frequency Selectivity and Inherent Temporal Fluctuations in Noise
Recent studies in normal-hearing listeners have used envelope-vocoded stimuli to show that the masking of speech by noise is dominated by the temporal-envelope fluctuations inherent in noise, rather than just overall power. Because these studies were based on vocoding, it was expected that cochlear-implant (CI) users would demonstrate a similar sensitivity to inherent fluctuations. In contrast, it was found that CI users showed no difference in speech intelligibility between maskers with and without inherent envelope fluctuations. Here, these initial findings in CI users were extended to listeners with cochlear hearing loss and the results were compared with those from normal-hearing listeners at either equal sensation level or equal sound pressure level. The results from hearing-impaired listeners (and in normal-hearing listeners at high sound levels) are consistent with a relative reduction in low-frequency inherent noise fluctuations due to broader cochlear filtering. The reduced effect of inherent temporal fluctuations in noise, due to either current spread (in CI users) or broader cochlear filters (in hearing-impaired listeners), provides a new way to explain the loss of masking release experienced in CI users and hearing-impaired listeners when additional amplitude fluctuations are introduced in noise maskers.
KeywordsCochlear hearing loss Hearing in noise Speech perception
Speech perception is a major communication challenge for people with hearing loss and with cochlear implants (CIs), particularly when the speech is embedded in background noise (e.g., Humes et al. 2002; Zeng 2004). Recent work has suggested that it is not so much the overall noise energy that limits speech perception in noise, as suggested by earlier work (French and Steinberg 1947; Kryter 1962; George et al. 2008), but rather the energy in the inherent temporal-envelope modulations in noise (Dubbelboer and Houtgast 2008; Jorgensen and Dau 2011; Stone et al. 2011, 2012; Jorgensen et al. 2013; Stone and Moore 2014). In a recent study (Oxenham and Kreft 2014) we examined the effects of inherent noise fluctuations in CI users. In contrast to the results from normal-hearing (NH) listeners, we found that CI users exhibited no benefit of maskers without inherent fluctuations. Further experiments suggested that the effective inherent noise envelope fluctuations were reduced in the CI users, due to the effects of current spread, or interactions between adjacent electrodes, leading to smoother temporal envelopes.
One factor suggesting that HI listeners may also experience less influence of inherent noise fluctuations is that the modulation spectrum is altered by broadening the filters: for an ideal rectangular filter, the modulation power of Gaussian noise after filtering has a triangular distribution, reaching a minimum of no power at a frequency equal to the bandwidth of the filter (Lawson and Uhlenbeck 1950). Although widening the filter does not alter the area under the modulation spectrum, it results in relatively less power at lower modulation frequencies (see Fig. 1). Given that low modulation frequencies are most important for speech, the relative reduction in modulation power at low modulation frequencies may reduce the influence of the inherent fluctuations for listeners with broader filters, due to hearing loss.
The aim of this experiment was to test the resulting prediction that hearing loss leads to less effect of inherent noise fluctuations on speech masking. We compared the results of listeners with cochlear hearing loss with the performance of young NH listeners and age-matched NH listeners. Performance was compared for roughly equal sensation levels (SL), and for equal sound pressure levels (SPL) to test for the effect of overall level on performance in NH listeners.
Nine listeners with mild-to-moderate sensorineural hearing loss (4 male and 5 female; mean age 61.2 years) took part in this experiment. Their four-frequency pure-tone average thresholds (4F-PTA from 500, 1000, 2000, and 4000 Hz) ranged from about 25 to 65 dB HL (mean ~ 40 dB HL). Nine listeners with clinically normal hearing (defined as 20 dB HL or less at octave frequencies between 250 and 4000 Hz; mean 4F-PTA 7.6 dB HL), who were matched for age (mean age 62.2 years) and gender with the HI listeners, were run as the primary comparison group. In addition, a group of four young (mean age 20.5 years; mean 4F-PTA 2.8 dB HL) NH listeners were tested. All experimental protocols were approved by the Institutional Review Board of the University of Minnesota, and all listeners provided informed written consent prior to participation.
The speech and the masker were mixed and low-pass filtered at 4000-Hz, and were either presented unprocessed or were passed through a tone-excited envelope vocoder that simulates certain aspects of CI processing (Dorman et al. 1998; Whitmal et al. 2007). The stimulus was divided into 16 frequency subbands, with the same center frequencies as the 16 tone maskers. The temporal envelope from each subband was extracted using a Hilbert transform, and then the resulting envelope was lowpass filtered with a 4th-order Butterworth filter and a cutoff frequency of 50 Hz. This cutoff frequency was chosen to reduce possible voicing periodicity cues, and to reduce the possibility that the vocoding produced spectrally resolved components via the amplitude modulation. Each temporal envelope was then used to modulate a pure tone at the center frequency of the respective subband.
The stimuli were generated digitally, converted via a 24-bit digital-to-analog converter, and presented via headphones. The stimuli were presented to one ear (the better ear in the HI listeners), and the speech-shaped noise was presented in the opposite ear at a level 30 dB below the level of the speech. The listeners were seated individually in a double-walled sound-attenuating booth, and responded to sentences by typing what they heard via a computer keyboard. Sentences were scored for words correct as a proportion of the total number of keywords presented. One sentence list (of 20 sentences) was completed for each masker type and masker level. Presentation was blocked by condition (natural and vocoded, and speech level), and the order of presentation was counterbalanced across listeners. The test order of signal-to-masker ratios was random within each block.
The results from the conditions where the speech was presented at roughly equal sensation level to all participants (85 dB SPL for the HI group, and 40 dB SL for the NH groups) are shown in Fig. 3. The upper row shows results without vocoding; the lower row shows results with vocoding. The results from the young NH, age-matched NH, and HI listeners are shown in the left, middle and right columns, respectively.
Consider next the results from the vocoded conditions (lower row of Fig. 3). Here, the benefits of frequency selectivity have been reduced by limiting spectral resolution to the 16 vocoder channels. As expected, the MT masker produces very similar results to the GN masker, as both produce very similar outputs from the tone vocoder. The young NH and the age-matched NH groups seem able to take similar advantage of the lack of inherent masker fluctuations in the PT condition. In contrast, the differences between the PT and MT seem less pronounced in the HI listeners; although some differences remain (in contrast to CI users), they are smaller than in the NH listeners.
Overall, the HI group showed smaller-than-normal differences between maskers with and without inherent fluctuations. This loss of sensitivity to inherent fluctuations was particularly apparent in the vocoded conditions (Fig. 3; lower right panel). However, in contrast to earlier results from CI users (Oxenham and Kreft 2014), some differences remained between conditions with and without inherent fluctuations, suggesting that the effects of poorer frequency selectivity are not as profound as for CI users. This could be for two reasons: First, frequency selectivity in HI listeners with mild-to-moderate hearing loss is not as poor as that in CI users. Second, the difference between the interaction occurring before and after envelope extraction may affect outcomes to some extent, in that the reduction in masker modulation power may be greater for CI users than for HI listeners, even with a similar loss of frequency selectivity.
One interesting outcome was that the differences between the NH groups and the HI group were not very pronounced when the speech was presented to the groups at the same high SPL. It is well known that frequency selectivity becomes poorer in NH listeners at high levels (e.g., Nelson and Freyman 1984; Glasberg and Moore 2000). Apparently the filter broadening with level in NH listeners is sufficient to reduce the effects of inherent masker fluctuations.
Overall, the results show that the importance of inherent masker fluctuations in determining speech intelligibility in noise depends to some extent on the conditions and the listening population. It cannot be claimed that inherent masker fluctuations always limit speech perception, as the effect of the fluctuations is non-existent in CI users (Oxenham and Kreft 2014), and is greatly reduced in HI listeners and in NH listeners at high sound levels. Finally, the reduced effect of inherent fluctuations may also provide a reason for why HI listeners (as well as CI users) exhibit less masking release in the presence of maskers with imposed additional fluctuations.
This work was supported by NIH grant R01 DC012262.
- Lawson JL, Uhlenbeck GE (1950) Threshold signals, vol 24. McGraw Hill, New YorkGoogle Scholar
<SimplePara><Emphasis Type="Bold">Open Access</Emphasis> This chapter is distributed under the terms of the Creative Commons Attribution-Noncommercial 2.5 License (http://creativecommons.org/licenses/by-nc/2.5/) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.</SimplePara> <SimplePara>The images or other third party material in this chapter are included in the work's Creative Commons license, unless indicated otherwise in the credit line; if such material is not included in the work's Creative Commons license and the respective action is not permitted by statutory regulation, users will need to obtain permission from the license holder to duplicate, adapt or reproduce the material.</SimplePara>