Speech Coding in the Midbrain: Effects of Sensorineural Hearing Loss
In response to voiced speech sounds, auditory-nerve (AN) fibres phase-lock to harmonics near best frequency (BF) and to the fundamental frequency (F0) of voiced sounds. Due to nonlinearities in the healthy ear, phase-locking in each frequency channel is dominated either by a single harmonic, for channels tuned near formants, or by F0, for channels between formants. The alternating dominance of these factors sets up a robust pattern of F0-synchronized rate across best frequency (BF). This profile of a temporally coded measure is transformed into a mean rate profile in the midbrain (inferior colliculus, IC), where neurons are sensitive to low-frequency fluctuations. In the impaired ear, the F0-synchronized rate profile is affected by several factors: Reduced synchrony capture decreases the dominance of a single harmonic near BF on the response. Elevated thresholds also reduce the effect of rate saturation, resulting in increased F0-synchrony. Wider peripheral tuning results in a wider-band envelope with reduced F0 amplitude. In general, sensorineural hearing loss reduces the contrast in AN F0-synchronized rates across BF. Computational models for AN and IC neurons illustrate how hearing loss would affect the F0-synchronized rate profiles set up in response to voiced speech sounds.
KeywordsVowels Modulation transfer function Neural coding
The impact of sensorineural hearing loss (SNHL) on speech communication is perhaps the most significant problem faced by listeners with hearing loss. A puzzle regarding the effects of SNHL is that relatively small amounts of loss have a surprisingly large impact on listeners, especially in noisy acoustic environments. Here we focus on the effects of SNHL on the representation of voiced speech sounds in the auditory periphery and in the midbrain (inferior colliculus, IC). Responses of models for auditory-nerve (AN) fibres and IC neurons provide insight into how changes in the temporal structure of AN responses impact representations at the level of the IC. Neurons in the IC are sensitive to low-frequency rate fluctuations, including rates synchronized to envelopes of amplitude-modulated (AM) stimuli.
Vowels are an essential component of all languages, and play an important role in carrying information, especially in running speech (e.g. Kewley-Port et al. 2007). Vowel sounds are characterized by voicing, which results in a spectrum with a fundamental frequency (F0), related to voice pitch, and harmonics of F0. The spectrum is shaped by vocal tract resonances, or formants. The locations of the lowest two frequency formants (F1 and F2), distinguish different vowel sounds (Fant 1960). Neural coding of these formants is essential for understanding how the brain processes speech.
Neural studies of vowel coding have focused on rate-place and fine-structure temporal coding of formants in AN responses (e.g. Sachs and Young 1979; Young and Sachs 1979; Delgutte and Kiang 1984; reviewed by Young 2008). Consideration of midbrain responses to vowels shifts the focus to the peripheral representation of low-frequency fluctuations (≲ 250 Hz) associated with F0. IC cells are particularly sensitive to amplitude modulation in the frequency range of F0 (Langner 1992). In the healthy ear, nonlinear AN response properties result in systematic patterns of neural fluctuations synchronized to F0 or higher harmonics near AN characteristic frequencies. In particular, synchrony capture results in the dominance of a single harmonic near a spectral peak on the fine-structure of AN responses to voiced sounds (Delgutte and Kiang 1984; Deng and Geisler 1987). After trauma-induced SNHL, synchrony capture is reduced, and AN fibres respond to multiple harmonics (Miller et al. 1997). These studies focused how synchrony capture effects fine-structure coding; however, this mechanism also sets up a pattern of low-frequency fluctuations across the AN population. In the healthy ear, the fibres that are “captured” by a single harmonic and have responses with relatively flat envelopes. In contrast, fibres tuned to frequencies between formants respond to multiple harmonics and strong rate fluctuations at F0. The amplitude of F0-related neural fluctuations provides a robust code for the formant frequencies. In the ear with SNHL, decreased synchrony capture results in AN fibres that respond to multiple harmonics, thus F0-related fluctuations prevail across all driven frequency channels. The profile of F0-related fluctuations that codes the formants, and that ultimately drives modulation-sensitive midbrain neurons, is diminished by SNHL.
In this study, computational models illustrate vowel responses of AN fibres with and without SNHL. Next, models for three types of modulation sensitivity in the IC illustrate the effect of SNHL on population responses in the midbrain.
The Zilany et al. (2014) model for AN responses has a provision for including impairment due to outer and inner hair cell dysfunction (Zilany and Bruce 2007), which reduces synchrony capture in a manner comparable to that observed in acoustically traumatized ears (Miller et al. 1997). Here SNHL was simulated by setting AN model parameters COHC and CIHC to 0.3; these values simulate mild hearing loss, with threshold elevations ranging from about 15 dB at 500 Hz to 30 dB at 2000 Hz and higher.
The band-enhanced model (Fig. 1, BE) was Nelson and Carney’s (2004) same-frequency inhibitory-excitatory (SFIE) model. The key mechanism in this model is a dynamic interaction between relatively short-duration excitatory and relatively long-duration, delayed inhibitory synaptic potentials. The durations of the excitatory and inhibitory potentials and the delay of the inhibition with respect to the excitation determine the model’s best modulation frequency (BMF).
An extension of the SFIE model was used for the band-suppressed model (Fig. 1, BS). This model receives the same ascending excitatory input as the band-enhanced model, but it is inhibited by the band-enhanced model (Carney et al. 2015; Kim et al. 2015b). The BMF of the band-enhanced model determines the worst modulation frequency (WMF) for the band-suppressed model.
Finally, the hybrid MTF model receives the same ascending excitatory input as the other model IC neurons, but is inhibited by both band-enhanced and band-suppressed cells (Fig. 1, Hybr). The relation between the BMF and WMF of the inhibitory inputs determines the shape of the hybrid model cell’s MTF.
Figure 2d, f illustrates features of the temporal responses of AN fibres to vowels shown using the dominant components, which are the largest spectral components of the post-stimulus time histograms (Delgutte and Kiang 1984; Miller et al. 1997). Dominant components for the healthy AN model (Fig. 2d) show synchrony capture, or dominance of the temporal responses by a single harmonic near the formant peak. The synchrony capture extends across a span of fibres with BFs near each formant frequency (red arrows). The dominance of these responses by a harmonic near the formants (orange arrows) results in absence of synchrony to F0 (black arrow) for BFs near formants. In contrast, fibres tuned between the formants are not dominated by a single harmonic. Because these fibres respond to multiple harmonics near their BFs (green arrow), their responses synchronize to the “beats” at F0 (black arrow).
The dominant components for model AN fibres with SNHL simulated by outer and inner hair cell dysfunction are shown in Fig. 2f. Physiological studies have shown that synchrony capture is weaker in these fibres (Miller et al. 1997). The model response illustrates reduced synchrony capture, especially to F2 (Fig. 2f, red arrow), even for the mild hearing loss in this simulation. The reduced synchrony capture is more evident in the pattern of synchrony to F0, which extends across all regions where there is sufficient energy to drive the neurons (Fig. 2f, black arrow). Thus, the profile of F0-synchronized rates differs qualitatively between the impaired and the healthy AN populations. For BFs tuned between F1 and F2 (green arrow), the response to F0 (black arrow) is reduced in the impaired nerve, whereas in the BF regions of F1 and of F2 (orange arrows), the response to F0 is increased in the impaired nerve. The net effect of SNHL is a reduction in the contrast of the F0-synchronized rates across BF.
The rate profile for healthy band-suppressed IC model neurons has peaks at the formant frequencies (Fig. 4a, magenta). These cells respond more strongly at frequencies where the low-frequency fluctuations are reduced (Fig. 2d, orange arrows). The peaks in the rate profile at the formant frequencies are robust in the model hybrid population (Fig. 4a, green). This model result motivates further physiological and computational investigation of these cells. The hybrid neuron and model in Fig. 3 are most strongly driven by a contrast in neural fluctuations (i.e., strong neural fluctuations in the 8–30 Hz range and reduced fluctuations in the 30–200 Hz range); other MTF shapes are also observed for hybrid neurons (Krishna and Semple 2000).
Model IC responses with impaired AN inputs (Fig. 4b) differ dramatically from the healthy responses. The band-enhanced rate profile (Fig. 4b, blue) has peaks approximately where the healthy model has valleys, and vice versa. The impaired model profile is explained by reduced synchrony capture: the impaired AN model synchronizes to F0 whenever the fibres are adequately driven by stimulus energy (see Fig. 2f). Thus the impaired band-enhanced rate profile reflects the energy spectrum of the vowel, unlike the healthy rate profile, which is inversely related to energy. The impaired band-suppressed (Fig. 4b, magenta) and hybrid (Fig. 4b, green) model responses also have peaks near F1 and F2. Unlike the healthy case, the inhibition from the shifted peaks in the impaired band-enhanced model suppresses the peaks in the band-suppressed and hybrid models. These results emphasize the fact that the strong rate profiles in the healthy IC model are not simply explained by the rate vs.BF profile of the AN. The healthy band-suppressed and hybrid profiles are created by a synchrony-to-rate transformation between the AN and IC and enhanced by disinhibition at the formant frequencies. In general, the rate profiles for all three response types are qualitatively different for this mildly impaired model due to differences in the F0-synchronized rate between the healthy and impaired AN models (Fig. 2d, f).
Vowel coding studies have historically focussed on representation of spectral peaks, or formants, in AN rates or synchrony to stimulus components near BF. These representations are vulnerable to additive noise and vary with sound level. The sensitivity of midbrain neurons to low-frequency fluctuations inspired a shift of focus to the contrast in pitch-related fluctuations along the BF axis in vowel coding (Carney et al. 2015). In the healthy AN, there is a robust contrast in the profile of the F0-synchronized discharge rate across the BF axis (Fig. 2d). Midbrain neurons’ sensitivities to these F0-synchronized rates transforms the temporally coded AN profile into a discharge rate profile (Fig. 4a).
The representation of F0-synchronized rates is affected by several mechanisms that are influenced by SNHL: synchrony capture, threshold (and thus rate saturation), and bandwidth of tuning. Bandwidth, in turn, affects the modulation spectrum of peripheral responses. Wider peripheral bandwidths associated with SNHL result in a wider modulation bandwidth and a reduction of the amplitude of low-frequency fluctuations. These factors have little influence on the mean rates of AN fibres; however, they have a large effect on F0-synchrony in the AN, which in turn affects the responses of IC neurons.
These IC model responses have implications for the design of signal-processing strategies for listeners with SNHL. Recreating stimulus spectra in the AN rate profiles will not elicit appropriate responses from central neurons, whereas restitution of the F0-synchronized rate profiles may (Rao and Carney 2014). This result suggests an experimental question: If the profile of peripheral F0-synchrony in response to voiced sounds can be restored to normal for these listeners, can intelligibility of speech, and especially speech in background noise, be improved? The results of such a test would depend on the status of central auditory pathways, which may undergo change following cochlear trauma (Suneja et al. 1998; Salvi et al. 2000).
Supported by NIDCD-010813.
- Carney LH, Li T, McDonough JM (2015) Speech coding in the brain: representation of formants by midbrain neurons tuned to sound fluctuations. eNeuro 2(4) e0004–15.2015 1–1. doi: 10.1523/ENEURO.0004–15.2015Google Scholar
- Fant G (1960) Acoustic theory of speech production. Mouton, HagueGoogle Scholar
- Kim DO, Bishop BB, Kuwada S, Carney LH (2015a). Band-Enhanced and Band-Suppressed Rate Modulation Transfer Functions of Inferior Colliculus Neurons and a Model: effects of Duty Cycle and Rise/Fall Rate, Association for Research in Otolaryngology. 38th Annual Midwinter Meeting, abstract #PS-423Google Scholar
<SimplePara><Emphasis Type="Bold">Open Access</Emphasis> This chapter is distributed under the terms of the Creative Commons Attribution-Noncommercial 2.5 License (http://creativecommons.org/licenses/by-nc/2.5/) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.</SimplePara> <SimplePara>The images or other third party material in this chapter are included in the work's Creative Commons license, unless indicated otherwise in the credit line; if such material is not included in the work's Creative Commons license and the respective action is not permitted by statutory regulation, users will need to obtain permission from the license holder to duplicate, adapt or reproduce the material.</SimplePara>