Introduction

Speech recognition becomes progressively more difficult and effortful with age. While hearing loss in older adults is a primary contributing factor, age-related declines in speech recognition are observed independently of hearing loss (van Rooij and Plomp 1990; Dubno et al. 1997). Speech recognition is particularly affected in complex and demanding listening conditions in which word intelligibility is made difficult (Sommers and Danielson 1999; Gordon-Salant and Fitzgibbons 2001, 2004; Dubno et al. 2005, 2006). Neural systems that support cognitive control become increasingly engaged in demanding listening conditions (Obleser et al. 2007). A failure of cognitive control, specifically the failure to inhibit irrelevant stimuli and focus on speech, has been proposed to explain the speech recognition difficulties of older adults (Sommers 1997; Dywan et al. 2001) and cognitive declines in general (Gazzaley and D’Esposito 2007).

The frontal lobe systems that support cognitive control demonstrate age-related structural and functional changes (Raz et al. 1997; Milham et al. 2002; Nielson et al. 2002; Cabeza et al. 2004; Tisserand et al. 2004; Colcombe et al. 2005). In particular, the ACC and MFG exhibit age-related changes in activation during speech comprehension and memory retrieval tasks (Grady et al. 2005; Sharp et al. 2006). The findings from many age-related imaging studies indicate that older adults exhibit increased frontal lobe activity during memory (Cabeza et al. 1997; McIntosh et al. 1999; Reuter-Lorenz et al. 2000; Rypma and D’Esposito 2000; Cabeza et al. 2002), perception (Grady et al. 1994; Fernandes et al. 2006; Moffat et al. 2006), and response inhibition tasks (Milham et al. 2002; Nielson et al. 2002). This increased frontal activity is hypothesized: (1) to be compensatory for high functioning older adults (Cabeza et al. 2002), (2) to reflect the need for greater cognitive control than younger adults (Dywan et al. 2002), and (3) to reflect the increased amount of irrelevant information that older adults retain in working memory compared to younger adults (Hasher and May 1999). While one study has examined age-related changes in speech comprehension (Sharp et al. 2006), there are no imaging studies that have examined age-related changes in word recognition. This study examined the extent to which frontal lobe regions exhibited age-related changes in activity during word recognition when words were filtered to parametrically vary word intelligibility. In addition, we hypothesized that structural declines in the temporal lobe regions that are responsive to speech would predict the increased reliance on frontal lobe systems for word recognition.

Materials and methods

Subjects

Fifteen adults, ranging in age from 21–75 years (mean 42.1, SD 18.7 years; nine female), participated in this study. The participants were recruited from the Medical University of South Carolina (MUSC) community and Charleston, SC area through word of mouth and a longitudinal study of age-related hearing loss (presbyacusis). Their average years of education was 17.7, SD 2.0 (16 years is equivalent to a 4-year college degree). The aims of this study were explained to each participant and MUSC Institutional Review Board-approved informed consent was obtained. All participants in this study had audiometric thresholds below 25 dB HL at octave frequencies from 250 to 3000 Hz (ANSI 2004). In addition, threshold masking noise (described below) was used to control for individual differences in hearing thresholds below 25 dB HL.

Image acquisition and task design

A sparse sampling design was used to: (1) limit the confounding influence of scanner noise on the stimuli and on neural responses to the stimuli; (2) provide time to generate a verbal response; and (3) provide time for participants to stabilize their heads before the next TR (Fridriksson et al. 2006). T2*-weighted functional images were acquired on a Philips 3T scanner using a single shot echo-planar imaging (EPI) sequence that covers the whole brain (32 slices with a 64 × 64 matrix, TR = 8 s, TE = 30 ms, slice thickness = 3.25 mm, and a TA = 1,647 ms). One volume was collected for each 8 s TR. T1-weighted images were also collected for brain structure analyses (160 slices with a 256 × 256 matrix, TR = 8.13 ms, TE = 3.7 ms, flip angle = 8°, slice thickness = 1 mm, and no slice gap).

Each participant performed a word recognition task in which they listened to 40 words that were presented across four randomly ordered low-pass filter conditions for a total of 160 event-related trials (upper cutoff frequencies = 400, 1,000, 1,600, and 3,150 Hz; the lower cutoff frequency was fixed at 200 Hz; audio examples are presented in Supplementary Materials). We chose to filter the words to vary word intelligibility rather than use multi-talker babble or noise-vocoded stimuli as a preliminary step towards understanding cortical representation for speech in people with high frequency hearing loss, which will be addressed in future studies. The words were nouns selected from a list of 400 monosyllabic consonant–vowel–consonant words used by Dirks et al. (2001) and presented at 75 dB SPL. Lexical difficulty of the words [the combined influence of word frequency (mean = 43.2, SD = 77.4), the number of other similar-sounding words (mean = 18.3, SD = 6.6), and the frequency of the similar sounding words (mean = 243.5, SD = 444.1), Sommers and Danielson 1999] was normally distributed. One goal of this study was to examine age-related changes in central representations for speech while controlling for peripheral auditory system factors that might affect individual variability in cortical responsiveness to speech. Towards that end, we included only adults with clinically normal hearing and we presented a broadband masker across each 8 s TR to minimize individual differences in hearing thresholds. The broadband masker was digitally generated and its spectrum was adjusted at one-third-octave intervals to produce equivalent masked thresholds for all subjects, and thus control for individual differences in hearing thresholds. Band levels of the noise were set to achieve masked thresholds of 20–25 dB HL from 200 to 3,150 Hz, 30 dB HL at 4,000 and 5,000 Hz, and 40 dB HL at 6,300 Hz. Pilot testing of the word recognition experiment was performed with 10 young adults in a double-walled sound-treated room, which demonstrated that word recognition decreased linearly with decreased cutoff frequency (r = 0.99) without a ceiling effect for the 3,150-Hz condition. Nearly identical word recognition results were observed in the experimental sample of 15 subjects (ages 21–75) measured during MR scanning. Eprime software (Psychology Software Tools Inc.) and an IFIS-SA control system (Invivo Corp.) were used to present the words to subjects in the scanner. The broadband noise was presented from a separate PC throughout the experiment. The words were mixed with the broadband noise at precisely 2.5 s into the 8 s TR using a standard audio mixer, and then delivered to the subject through custom-made piezoelectric insert earphones (Sensimetrics Corp.). Signal levels were calibrated using a Bruel & Kjaer sound-level meter (Type 2231). Participants were instructed to listen and respond with the word they heard or with “nope” if they could not recognize the word, ensuring that a motor response was produced on each trial. Each response was recorded as correct, incorrect, or “nope” by two raters (M.E. and A.W.). An overt oral response was chosen so that the results were directly relatable to audiologic assessment of word recognition and because speech production tasks have been used successfully in other language studies (Gracco et al. 2005; Shuster and Lemieux 2005; Fridriksson et al. 2006). Figure 1 summarizes the experimental design used in this study.

FIG. 1
figure 1

Schematic of the experimental design used in this study for each trial.

Data analyses

Image pre-processing was performed using SPM5 algorithms (http://www.fil.ion.ucl.ac.uk/spm). Each participant’s native space images were realigned to the first volume and unwarped to correct for head movement and susceptibility distortions. Image volumes, slices, and voxels with significant artifact were identified using the ArtRepair toolbox (http://cibsr.stanford.edu/tools/ArtRepair/ArtRepair.htm) based on scan-to-scan motion (1 SD change in head position) and outliers relative to the global mean signal (3 SD from the global mean). An average of three image volumes (SD 1.6) was excluded for artifact from each subject’s dataset. The images were normalized to the ICBM EPI template and smoothed with an 8-mm Gaussian kernel to ensure that the data were normally distributed and appropriate for parametric testing. A first level fixed-effects statistical analysis was performed for each individual’s images to generate estimates of differences in activity for correct compared to incorrect word recognition. To avoid problems of multicollinearity that may have arisen from the dependency of subject performance on filter condition, separate first level fixed-effects analyses were performed to identify brain regions that parametrically varied across the four filter conditions. As described below, there was no age-related effect on word recognition. Therefore, all trials were included in the parametric filter condition analysis, which identified brain regions that were increasingly responsive to word intelligibility. In addition to the two dummy scans that were omitted for each run, the first real scan from each run was omitted to limit longitudinal magnetization effects that occur at the beginning of each fMRI experiment. The data were convolved with the SPM5 canonical hemodynamic response function and high-pass filtered at 128 s.

Second level random-effects analyses were performed to examine age-related changes in brain regions engaged during correct versus incorrect responses, as well as age-related changes across the filter conditions. Based on the SPM results output, a joint statistical threshold of peak voxel p < 0.01 and cluster extent p < 0.01 was used for all of the second level analyses to be sensitive to sharp peak and broadly distributed effects (Poline et al. 1997). All of the peak voxel values reported in this study have probability values <0.001. A gray matter mask representing at least a 20% probability of gray matter across the sample, obtained from the subject’s normalized and segmented gray matter images, was used to limit the analyses to gray matter regions and the number of statistical comparisons.

Voxel-based morphometry was performed using SPM5 to determine the extent to which the age-related changes in brain activation could be attributed to structural declines. The T1-weighted images were normalized, segmented, bias field corrected, and modulated using an integrated generative model and the ICBM a priori gray matter, white matter, and CSF templates [unified segmentation (Ashburner and Friston 2005)]. The normalized, segmented, and modulated images were then smoothed using a 10-mm kernel to ensure the data were normally distributed. A binary mask of the increasing intelligibility functional results was created to determine the extent to which speech responsive brain regions exhibited age-related declines in gray matter volume. The average voxel-wise gray matter volume, within the speech responsive regions associated with age, was collected using MarsBaR (Brett 2002). These values were used to determine the extent to which age-related changes in left MFG activation were related to declining gray matter volume in speech responsive brain regions. An estimate of total gray matter volume was collected from the modulated and normalized gray matter images using custom Matlab (The Mathworks, Inc.) code (http://www.cs.ucl.ac.uk/staff/G.Ridgway/vbm/get_totals.m). This estimate of total gray matter volume was used in partial correlations to determine whether (1) specific age-related gray matter volume changes in speech responsive brain regions or (2) global declines in gray matter volume predicted age-related changes in left MFG activation described below.

Results

Word recognition varied linearly with filter cut-off frequency (r = 0.99; Fig. 2A) and did not vary across age (Fig. 2B), indicating that word recognition was equivalent across age. In addition, there was no significant association between age and the percentage incorrect responses (r = 0.26, ns) or “nope” responses (r = −0.37, ns). Across the subjects, increasing word intelligibility was associated with increasing activity in temporal lobe regions previously shown to be responsive to speech (Fig. 3; Binder et al. 2000; Obleser et al. 2006; Scott et al. 2006; Obleser et al. 2007). In particular, bilateral anterior superior temporal sulcus/superior temporal gyrus (STS/STG) and the left hippocampus/entorhinal cortex (HC) regions exhibited increasing activity with increasing word intelligibility.

FIG. 2
figure 2

Word recognition across the sample and by age. A Percentage of correct responses during the fMRI word recognition experiment. The filter cutoff frequencies were chosen to linearly affect word recognition based on preliminary pilot testing. Each colored symbol corresponds to each subject’s performance for each filter condition in B. B There were no age-related changes in word recognition for any of the four filter conditions.

FIG. 3
figure 3

Parametric increases in regional brain activity with increasing word intelligibility. These results, as well as the medial prefrontal and posterior cingulate regions, are consistent with the results of a previous study demonstrating increasing activity in these regions with increasing intelligibility for sentences that had a highly predictable semantic coherence (Obleser et al. 2007).

Despite the absence of age-related changes in word recognition for any of the filter conditions, age-related changes in brain activation were observed during word recognition. Older adults were more likely than younger adults to engage bilateral ACC, left MFG, bilateral calcarine, and left ventral occipital regions for correct compared to incorrect word recognition (Fig. 4, Supplemental Table 1). In addition, older adults were more likely to engage the left MFG in the most intelligible word condition (3,150 Hz) in comparison to the younger adults who were more likely to engage this region in the less intelligible word conditions (400, 1,000, and 1,600 Hz) (Supplemental Fig. 1). These results demonstrate that older adults increasingly engaged the left MFG in easy listening conditions associated with correct word recognition, but that younger adults increasingly engaged the left MFG in listening conditions that were likely to result in errors in word recognition.

FIG. 4
figure 4

Age-related increases in activity for correct compared to incorrect word recognition across the four filter conditions. A Cluster results demonstrating regions exhibiting age-related changes in left MFG, ACC, and visual cortex. B The results in A, and the left MFG in particular, occurred because older adults exhibited increased left MFG activity for correct word recognition, whereas younger adults exhibited increased left MFG activity for incorrect word recognition. The y-axis represents the average SPM contrast value from the MFG cluster.

In contrast to the age-related results for the left MFG, the entire sample demonstrated increased right frontal lobe activity for incorrect compared to correct word recognition and with increasingly filtered words (Supplemental Fig. 2A, B). In particular, there was increased right MFG and IFG activity for incorrect compared to correct word recognition and for the 400 Hz compared to 3,150 Hz filtered word conditions. Age was not associated with the contrast values from these right frontal regions (Supplemental Fig. 2C, D; Supplemental Tables 2 and 3).

Voxel-based morphometry correlation analyses demonstrated age-related declines within the gray matter regions exhibiting parametric increases in activity with increasing word intelligibility. Of the 7,394 voxels included in this analysis, significant age-related declines in gray matter volume were observed in 15 left STS/STG voxels, 76 left HC voxels, and 48 left post-central sulcus/supramarginal gyrus voxels (SMG/PCS) (FDR p < 0.05; Fig. 5A). Gray matter volume in these regions was also associated with the age-related changes in MFG activation for the correct–incorrect comparison (Fig. 5B). In particular, the gray matter volume in the left STS/STG and left HC were significantly correlated with left MFG activation for the correct–incorrect and 400–3,150 Hz comparisons after controlling for total brain volume (Table 1). These results indicate that locally specific structural declines in the left STG/STS and left HC covary with the age-related functional changes in the left MFG.

FIG. 5
figure 5

Age-related changes in gray matter volume predict left MFG activation. A Age-related changes in left STS/STG (−64, −10, and −4) and left HC (−18, 0, and −16) regions that were responsive to increasing speech intelligibility and predicted left MFG activation during word recognition, after controlling for brain volume. The left SMG/PCS (−42, −40, and 54; not shown) was not significantly predictive of left MFG activation after controlling for brain volume. B Gray matter volume in the left STS/STG (red diamonds) and left HC (yellow squares) predict left MFG activation from the correct–incorrect comparison (shown on the y-axis).

TABLE 1 Gray matter volume in speech responsive regions predicts age-related changes in left MFG activation during word recognition

Discussion

Age-related changes in left MFG and ACC activity were observed during word recognition in clinically normal hearing adults. The age-related changes were dependent on listening difficulty, indicating that cognitive control systems are increasingly used with increasing age to make correct word recognition responses in easy listening conditions. While perception and memory studies demonstrate age-related increases in left MFG activity, the results of this study further indicate that age-related structural declines in speech-responsive temporal lobe regions are tightly correlated with the increased left MFG activity. These results suggest that declining structural integrity of temporal lobe regions that support speech recognition leads to increased reliance on cognitive control systems to recognize words.

Our interpretation that greater cognitive control is required for the easiest word recognition conditions with increasing age is consistent with the age-related gray matter volume declines in temporal lobe regions that were responsive to increasing word intelligibility. Declining structural integrity of the left STS/STG and left HC predicted the age-related increase in reliance on left MFG activity for word recognition, even after controlling for global declines in gray matter volume. This result indicates that people with declining structural integrity of speech-responsive brain regions rely on cognitive control systems to perform word recognition tasks. In contrast to the older adults in this study, older adults with speech recognition difficulties may not be capable of relying on frontal lobe systems to compensate for degraded speech representations (Tremblay et al. 2002). Impairments in cognitive control may also explain why many older adults experience dissatisfaction and limited benefit from hearing aids.

Age-related changes in ACC and MFG activation have been observed for perceptual tasks (Grady et al. 1994; Fernandes et al. 2006; Moffat et al. 2006), such as the face and spatial-location matching, as well as memory (Cabeza et al. 1997; Grady et al. 1999; McIntosh et al. 1999; Reuter-Lorenz et al. 2000; Rypma and D’Esposito 2000; Cabeza et al. 2002; Grady et al. 2006; Grady et al. 2007) and response inhibition tasks (Milham et al. 2002; Nielson et al. 2002). We have interpreted the findings of this study as reflecting age-related changes in cognitive control, which is consistent with functional roles attributed to the MFG. Cognitive control is a broad construct, however, and could include response selection and suppression, directing attention, performance monitoring, or encoding and memory retrieval.

The age-related changes in ACC activation suggest that participants were engaging a system consistently shown to be important for conflict monitoring and error detection. In particular, the ACC is hypothesized to provide MFG with information about conflicting or ambiguous perceptual information so that MFG can guide the selection of any appropriate response (Botvinick et al. 1999; Kerns et al. 2004; Ridderinkhof et al. 2004). Age-related increases in cognitive control for the easiest listening condition would result in an up-regulation of the ACC, as well as the MFG. This interpretation is consistent with evidence for age-related changes in ACC during speech comprehension (Sharp et al. 2006). In this context, older adults may be monitoring their performance to a greater extent than younger adults in the easiest listening conditions while younger adults monitor performance to a greater extent in the more difficult listening conditions.

Increased task difficulty could also lead to an up-regulation of conflict monitoring systems. ACC and MFG regions are increasingly engaged with increasing task difficulty (Barch et al. 1997; Mattay et al. 2006; Tregellas et al. 2006). Age-related changes may be observed in these regions because relatively easy tasks may be more challenging for older adults compared to younger adults. For example, Grady et al. (1994) demonstrated age-related increases in left MFG activity for face matching and spatial-location matching tasks. These age-related changes appeared to be diminished for face matching and spatial-location matching tasks that required longer reaction times, suggesting they were more difficult. One strength of our parametric design was that it demonstrated age-related changes in left MFG activity with decreasing word intelligibility. Older adults in the sample demonstrated increased left MFG activity for the easiest listening condition while younger adults in the sample demonstrated comparatively increased left MFG activity for the most difficult conditions. This result is important because it indicates that age-related changes in blood oxygen level-dependent signal vary depending on the difficulty of the cognitive task. Similar observations have been reported from memory experiments in which older adults exhibit greater activity in MFG regions during relatively easier memory load conditions while younger adults exhibit increased activity in these regions with increasing memory load (Mattay et al. 2006). These results have been interpreted as a decline in neural efficiency that represents a need for recruitment of additional resources in relatively easy task conditions (Reuter-Lorenz 2002; Mattay et al. 2006).

An alternative explanation for our age-related MFG results is that a short-term memory strategy was differentially used across age to perform the word recognition task. Older adults often fail to inhibit irrelevant or extraneous information, which has been associated with a richer array of information in working memory compared to younger adults (Hasher and May 1999). The age-related changes in the left MFG may reflect the engagement of frontal memory systems for previously presented words or a refreshing of representation for words in left MFG (Brodmann area 10, 46; Johnson et al. 2005). The association between increasing age and activation in posterior STG/STS regions implicated in phonological working memory (Hickok et al. 2000; Buchsbaum et al. 2005) for the 3,150–400 Hz comparison (Supplementary Fig. 1) supports the interpretation that short-term memory systems are increasingly engaged in easy listening conditions with increasing age. In addition, declines in hippocampal gray matter volume, within regions engaged by the word recognition task, were significantly correlated with left MFG activation. This observation is consistent with evidence of age-related increases in correlated activity between hippocampal and left MFG regions during memory encoding (Springer et al. 2005). Declining hippocampal integrity may increase the reliance on left MFG regions for the simplest of perceptual and memory tasks.

The results of this study indicate that increasing engagement of left MFG during word recognition begins in middle age, given the age range of our subjects. Importantly, this age-related change in activity was tightly correlated with declines in gray matter volume in regions that support word recognition and memory. Declining structural integrity of speech-responsive brain regions appears to result in a reliance on frontal lobe cognitive control systems to recognize speech in easy listening conditions. These results are consistent with a large body of evidence that older adults rely on prefrontal cortex for memory and perceptual tasks and for the first time directly implicates structural decline in the hippocampus and anterior STG, regions consistently engaged in memory and speech recognition tasks. We hypothesize that the increased need for cognitive control for successful word recognition is the basis for fatigue that many older adults with hearing loss experience during normal conversation and that perturbation of this cognitive control system results in a failure to inhibit competing sensory stimuli and impaired speech recognition.