Investigating the Role of Working Memory in Speech-in-noise Identification for Listeners with Normal Hearing
With the advent of cognitive hearing science, increased attention has been given to individual differences in cognitive functioning and their explanatory power in accounting for inter-listener variability in understanding speech in noise (SiN). The psychological construct that has received most interest is working memory (WM), representing the ability to simultaneously store and process information. Common lore and theoretical models assume that WM-based processes subtend speech processing in adverse perceptual conditions, such as those associated with hearing loss or background noise. Empirical evidence confirms the association between WM capacity (WMC) and SiN identification in older hearing-impaired listeners. To assess whether WMC also plays a role when listeners without hearing loss process speech in acoustically adverse conditions, we surveyed published and unpublished studies in which the Reading-Span test (a widely used measure of WMC) was administered in conjunction with a measure of SiN identification. The survey revealed little or no evidence for an association between WMC and SiN performance. We also analysed new data from 132 normal-hearing participants sampled from across the adult lifespan (18–91 years), for a relationship between Reading-Span scores and identification of matrix sentences in noise. Performance on both tasks declined with age, and correlated weakly even after controlling for the effects of age and audibility (r = 0.39, p ≤ 0.001, one-tailed). However, separate analyses for different age groups revealed that the correlation was only significant for middle-aged and older groups but not for the young (< 40 years) participants.
KeywordsAging Audiometrically normal Correlations Cognition Noise Older listeners Reading-span test Speech intelligibility Working-memory capacity Young listeners Matrix sentences
Recent years have seen an increased interest in the role of individual differences in cognitive functioning in speech and language processing and their interaction with different types of listening tasks and conditions. The psychological construct that has received the most attention in the emerging field of cognitive hearing science is working memory (WM), possibly because it has been shown to be involved in a wide range of complex cognitive behaviours (e.g. reading comprehension, reasoning, complex learning). WM can be conceptualised as the cognitive system that is responsible for active maintenance of information in the face of ongoing processing and/or distraction. Its capacity (WMC) is generally assessed by so-called complex span tasks, requiring the temporary storage and simultaneous processing of information. For example, in one of the most widely used WM tasks, the Reading-Span test (Baddeley et al. 1985), visually presented sentences have to be read and their semantic correctness judged (processing component), while trying to remember parts of their content for recall after a variable number of sentences (storage component).
A growing body of evidence from studies using mainly older hearing-impaired (HI) listeners indeed confirms that higher WMC is related to better unaided and aided speech-in-noise (SiN) identification, with correlation coefficients frequently exceeding 0.50 (Lunner 2003; Foo et al. 2007; Lunner and Sundewall-Thorén 2007; Arehart et al. 2013). In addition, high-WMC listeners were less affected by signal distortion introduced by hearing-aid processing (e.g. frequency or dynamic-range compression).
Consistent with these results, models of speech/language processing have started incorporating active cognitive processes (Rönnberg et al. 2013; Heald and Nusbaum 2014). For example, according to the Ease of Language Understanding model (Rönnberg et al. 2013), any mismatch between the perceptual speech input and the phonological representations stored in long-term memory disrupts automatic lexical retrieval, resulting in the use of explicit, effortful processing mechanisms based on WM. Both internal distortions (i.e., related to the integrity of the auditory, linguistic, and cognitive systems) and external distortions (e.g. background noise) are purportedly susceptible to contribute to the mismatch. Consequently, it is assumed that WMC also plays a role when individuals with normal hearing (NH) have to process spoken language in acoustically adverse conditions.
However, Füllgrabe et al. (2015) recently failed to observe a link between Reading-Span scores and SiN identification in older listeners (≥ 60 years) with audiometrically NH (≤ 20 dB HL between 0.125 and 6 kHz), using a range of target speech (consonants and sentences), maskers (unmodulated and modulated noise, interfering babble), and signal-to-noise ratios (SNRs).
2 Study Survey
To assess the claim that individual variability in WMC accounts for differences in SiN identification even in the absence of peripheral hearing loss, we surveyed published and unpublished studies administering the Reading-Span test and a measure of SiN identification to participants with audiometrically NH. To ensure consistency with experimental conditions in investigations of HI participants, only studies presenting sentence material “traditionally” used in hearing research (i.e., ASL, Hagerman, HINT, IEEE, QuickSIN, or Versfeld sentences) against co-located background maskers were considered. In addition, we only examined studies in which the effect of age was controlled for (either by statistically partialling it out or by restricting the analysis to a “narrow” age range), in order to avoid inflated estimates of the correlation between WMC and SiN tasks caused by the tendency for performance in both kinds of tasks to worsen with age. Figure 1 summarizes the results of this survey.
Correlation coefficients in the surveyed studies are broadly distributed, spanning almost half of the possible range of r values (i.e., from − 0.29 to 0.58). Confidence intervals (CIs) are generally large and include the null hypothesis in 21/25 and 24/25 cases for CIs of 95 and 99 %, respectively, suggesting that these studies are not appropriately powered. For the relatively small number of studies included in this survey, there is no consistent trend for stronger correlations in more complex and/or informationally masking backgrounds or at lower SNRs, presumably corresponding to more adverse listening conditions.
Across studies restricting their sample to young (18–40 years) participants, the weighted average r value is 0.12, less than 2 % of the variance in SiN identification. According to a power calculation, it would require 543 participants to have an 80 % chance of detecting such a small effect with p = 0.05 (one-tailed)!
3 Analysis of Cohort Data for Audiometrically Normal-Hearing Participants
Given the mixed results from previous studies based on relatively small sample sizes, we re-analysed data from a subset of a large cohort of NH listeners taking part in another study.
Participants were 132 native-English-speaking adults, sampled continuously from across the adult lifespan (range = 18–91 years). Older (≥ 60 years) participants were screened using the Mini Mental State Examination to confirm the absence of cognitive impairment. All participants had individual audiometric hearing thresholds of ≤ 20 dB HL at octave frequencies between 0.125 and 4 kHz, as well as at 3 kHz, in the test ear. Despite clinically “normal” audibility, the pure-tone average (PTA) for the tested frequency range declined as a function of age (r = 0.65, p ≤ 0.001, one-tailed). Since changes in sensitivity even in the normal audiometric range can affect SiN identification (Dubno and Ahlstrom 1997), PTA is treated as a possible confounding variable in analyses involving the entire age group.
WMC was assessed by means of the computerized version of the Reading-Span test (Rönnberg et al. 1989). Individual sentences were presented in three parts on a computer screen to be read aloud and judged as plausible or implausible. After three to six sentences, either the first or last word of each of the sentences had to be recalled. WMC corresponded to the number of correctly recalled words in any order.
SiN identification was assessed using the English version of the Matrix sentence test (Vlaming et al. 2011). Each target sentence, presented monaurally at 70 dB SPL, followed a fixed syntactic structure (proper noun—verb—numeral—adjective—noun) but had low semantic redundancy. The noise maskers had the same long-term spectrum as the target sentences and were either unmodulated or 100 % sinusoidally amplitude modulated at 8 or 80 Hz. Target and masker were mixed together at SNRs ranging from − 3 to − 15 dB, and the mixture was lowpass-filtered at 4 kHz.
3.2 Results and Discussion
Identification scores were transformed into rationalized arcsine units (RAUs) and averaged across masker types and SNRs to reduce the effect of errors of measurement and to yield a composite intelligibility score representative of a range of test conditions.
Confirming previous results for audiometrically NH listeners (Füllgrabe et al. 2015), Reading-Span and SIN identification scores showed a significant decline with age, with Pearson’s r = − 0.59 and − 0.68 (both p ≤ 0.001, one-tailed), respectively. The scatterplot in Fig. 2 shows that, considering all ages, performances on the tasks were significantly related to each other (r = 0.64, p ≤ 0.001, one-tailed). This association remained significant after partialling out the effects of age and PTA (r = 0.39, p ≤ 0.001, one-tailed), contrasting with the results of Besser et al. (2012), using a cohort including only a few (N = 8) older (≥ 60 years) participants, but being roughly consistent with those reported by Koelewijn et al. (2012) for a cohort comprised of middle-aged and older (≥ 40 years) participants (see Fig. 1a).
To further investigate the age dependency of the association between WMC and SiN identification, participants were divided into four age groups: “Young” (range = 18–39 years, mean = 28 years; N = 32), “Middle-Aged” (range = 40–59 years, mean = 49 years; N = 26), “Young-Old” (range = 60–69 years, mean = 65 years; N = 40), and “Old-Old” (range = 70–91 years, mean = 77 years; N = 34). Separate correlational analyses for each age group revealed that the strength of the association differed across groups (see Fig. 2). Consistent with the overall trend seen in Fig. 1, the correlation was weak and non-significant in the group of young participants (r = 0.18, p = 0.162, one-tailed). In contrast, the correlations were moderately strong and significant in the three older groups (all r ≥ 0.44, all p ≤ 0.011, one-tailed). Comparing the different correlation coefficients, after applying Fisher’s r-to-z transformation, revealed a significant difference between the Young and Old-Old group (z = − 1.75, p = 0.040, one-tailed). There was no evidence for a difference in variance between these groups (Levene’s test, F(1,64) < 1, p = 0.365).
The age-related modulation of the strength of the correlation between WMC and SiN perception could be due to the different performance levels at which the age groups operated in this study (mean identification was 68, 60, 57, and 48 RAUs for the Young, Middle-Aged, Young-Old, and Old-Old group, respectively). However, when performance only for the two lowest SNRs (corresponding to 46 RAUs) was considered, WMC was still not associated with SiN identification in the young participants (r = 0.04, p = 0.405, one-tailed).
Taken together, the reported results fail to provide evidence that, in acoustically adverse listening situations, WMC (as measured by the Reading-Span test) is a reliable and strong predictor of SiN intelligibility in young listeners with normal hearing. The new data presented here suggest that WMC becomes more important with age, especially in the oldest participants. One possible explanation for this increasing cognitive involvement with age could be the accumulation of age-related deficits in liminary but also supraliminary auditory processing (e.g. sensitivity to temporal-fine-structure and temporal-envelope cues; Füllgrabe 2013; Füllgrabe et al. 2015), resulting in under-defined and degraded internal representations of the speech signal, calling for WM-based compensatory mechanisms to aid identification and comprehension.
Our findings do not detract from the practical importance of cognitive assessments in the prediction of SiN identification performance in older HI listeners and the possible interaction between cognitive abilities and hearing-aid processing. Nor do they argue against the involvement of cognition in speech and language processing in young NH listeners per se. First, individual differences in WMC have been shown to explain some of the variability in performance in more linguistically complex task (such as in the comprehension of dynamic conversations; Keidser et al. 2015), presumably requiring memory or attentional/inhibitory processes associated with WMC (Conway et al. 2001; Kjellberg et al. 2008). Second, different cognitive measures, probing the hypothesized sub-processes of WM (e.g. inhibition, shifting, updating) or other domain-general cognitive primitives (e.g. processing speed) might prove to be better predictors of SiN processing abilities than the Reading-Span test.
In conclusion, and consistent with recent efforts to establish if and under which conditions cognitive abilities influence the processing of spoken language (e.g. Fedorenko 2014; Heinrich and Knight, this volume), the current results caution against the assumption that WM necessarily supports SiN identification independently of the age and hearing status of the listener.
We would like to thank our colleagues who shared and reanalysed their data, and Dr. Oliver Zobay for his statistical advice. The MRC Institute of Hearing Research is supported by the Medical Research Council (grant number U135097130). This work was also supported by the Oticon Foundation (Denmark). CF is indebted to Prof. Brian Moore for granting access to the test equipment of his laboratory.
- Dubno JR, Ahlstrom JB (1997) Additivity of multiple maskers of speech. In: Jesteadt W (ed) Modeling sensorineural hearing loss. Lawrence Erlbaum Associates, Hillsdale, pp 253–272Google Scholar
- Koelewijn T, Zekveld AA, Festen JM, Rönnberg J, Kramer SE (2012) Processing load induced by informational masking is related to linguistic abilities. Int J Otolaryngol 2012:865731. 65731.Google Scholar
- Kuik AM (2012) Speech reception in noise: on auditory and cognitive aspects, gender differences and normative data for the normal-hearing population under the age of 40. Bachelor’s thesis. Vrije Universiteit Amsterdam, AmsterdamGoogle Scholar
- Rönnberg J, Lunner T, Zekveld A, Sörqvist P, Danielsson H, Lyxell B, Dahlstrom O, Signoret C, Stenfelt S, Pichora-Fuller MK, Rudner M (2013) The Ease of Language Understanding (ELU) model: theoretical, empirical, and clinical advances. Front Syst Neurosci 7:31CrossRefPubMedPubMedCentralGoogle Scholar
- Souza P, Arehart K (2015) Robust relationship between reading span and speech recognition in noise. Int J Audiol 54: 705–713Google Scholar
Open Access This book is distributed under the terms of the Creative Commons Attribution-Noncommercial 2.5 License (http://creativecommons.org/licenses/by-nc/2.5/) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
The images or other third party material in this book are included in the work’s Creative Commons license, unless indicated otherwise in the credit line; if such material is not included in the work’s Creative Commons license and the respective action is not permitted by statutory regulation, users will need to obtain permission from the license holder to duplicate, adapt or reproduce the material.