INTRODUCTION

Age-related decline in sensory and cognitive capabilities, more specifically of fluid intelligence, can affect the quality of life negatively (Dalton et al. 2003). On the sensory side, visual and auditory capabilities (Swenor et al. 2013), and on the cognitive side, cognitive processing speed (Salthouse 1996; Kennedy and Raz 2009), short- and long-term memory functioning (Kemper et al. 2003; Charlton et al. 2010; Tonoki and Davis 2012), and top-down suppression (Gazzaley et al. 2008; Wild-Wall and Falkenstein 2010; Janse 2012) decline with increasing age. The present study shows that older adults have mechanisms to effectively compensate for the negative age-related changes to enhance perception, perhaps by relying more on the crystallized intelligence that can stay the same or become better over a lifetime (Cattell 1971; Baltes 1993).

To show such compensation, the phonemic restoration paradigm was used in this study (Warren 1970; Powers and Wilcox 1977; Verschuure and Brocaar 1983; Bashford et al. 1992). Here, missing parts of speech are perceptually filled in by the listener, which can improve speech intelligibility in noisy environments. Such restoration depends on an interaction between knowledge-driven and signal-driven processes (Başkent 2012), where linguistic skills, world knowledge, experience, expectations, situational context and Gestalt rules of perceptual grouping are used to form perceptual hypotheses based on sentential context and semantic, syntactic, spectral and temporal cues from the speech signal (Samuel 1981; Bashford et al. 1992; Srinivasan and Wang 2005; McDermott and Oxenham 2008; Riecke et al. 2009; Groppe et al. 2010). Degraded speech, in which parts of the signal are missing, reduces speech redundancy and increases reliance on the cognitive and linguistic mechanisms for top-down repair (Stenfelt and Rönnberg 2009).

A positive compensatory effect of phonemic restoration has previously been demonstrated with young normal hearing adults only (Başkent et al. 2009; Başkent 2012; Benard and Başkent 2013b). The effect diminished with younger adults tested with simulated hearing devices and with older adults who had hearing loss (Başkent et al. 2010; Başkent 2010, 2012). Hence, the question remained whether older individuals can benefit from the restoration mechanisms using their knowledge-based linguistic skills. In the present study, by presenting speech at varying levels of degradation, it was explored if, and when, older listeners can engage these compensatory restoration mechanisms. Additionally, as a way of inspecting the effects from age-related cognitive slowing, restoration was measured at varying speech rates. By selecting older participants with normal hearing, potential effects of age-related sensory decline were eliminated (Hoffman et al. 2012), which enabled us to focus on cognitive aspects. Specifically no further pre-selection was used, to have a representative group of older listeners that was not trained nor high-performing per se.

METHODS

Participants

Appropriate selection of the older group was crucial for the study to ensure that the results were mainly caused by aging, and not contaminated by age-related hearing loss and the resulting limitations in audibility. The older participants were selected to be of sufficiently advanced age to show age effects on speech perception (Mahncke et al. 2006; He et al. 2008; Wong et al. 2009; Adams et al. 2012; Füllgrabe 2013) while displaying nearly normal hearing (Hoffman et al. 2012). As a result, from the 21 older participants who applied for the study and self-reported to have normal hearing, 12 (57 %) qualified to have normal hearing, as defined by our inclusion criterion, during the screening.

Twelve young (six female, six male, mean age 22 years, ranging from 19 to 26 years) and 12 older (four female, eight male, mean age 66 years, ranging from 62 to 77 years) native Dutch speakers participated in this study. All participants were tested for normal hearing (defined as four-tone pure-tone average of the thresholds measured at the audiometric frequencies of 500, 1,000, 2,000 and 4,000 Hz less than or equal to 20 dB HL for the better ear, based on Stephens (1996); see Fig. 1 for individual hearing thresholds). Participants further reported having no hearing problems and no language disorders (e.g., dyslexia). They were naive to the testing materials and methods, and unaware of the purpose of the study. Written informed consent was obtained before participation and the study was approved by the Ethics Committee of the University of Groningen, Department of Psychology. Participation was financially compensated.

FIG. 1
figure 1

Individual hearing thresholds shown for the better ear for young and older adults.

Apparatus

The experiment was programmed in Matlab (version 7.10.0.499, 32-bit) and run under Mac OS X (10.5.8) on a Mac Pro. The audio stimuli were diotically presented through a Sennheiser HD 600 headphone, connected to an Echo Audiofire 4 external soundcard SPDF output and a Lavry Engineering DA10 digital-to-analog converter. Participants were seated in a sound-isolated testing booth. Participant responses were recorded with an Alesis Palm Track digital sound recorder in MP3 format for offline double-checking of the scores.

Stimuli

The speech corpus consisted of meaningful and everyday Dutch sentences sampled at 44.1 kHz (Versfeld et al. 2000). Originally, the authors created a subset of 78 lists (39 lists spoken by one male speaker and 39 spoken by one female speaker) of 13 sentences each, with the purpose of measuring speech reception thresholds in stationary speech-shaped noise. Each sentence in the corpus is syntactically and grammatically correct. Each sentence consists of four to nine words and each word consists of three or fewer syllables. In the present study, of the original 39 balanced lists of 13 sentences, spoken by the male talker, 33 lists were used. The presentation order was randomized for each participant.

Figure 2 schematically shows the steps that were involved in creating the speech stimuli. The first step was to change the speech rate (SR; 0.5×, 1× and 2× the original speech rate) by compressing or expanding the sentence recordings without altering the voice pitch. For this purpose, the pitch-synchronous-overlap-add (PSOLA) method (Moulines and Charpentier 1990) was employed using PRAAT (version 5.3.12) software (Boersma 2002), using the default settings (time steps of 10 ms, minimum pitch of 75 Hz and maximum pitch of 600 Hz). The second step was to introduce periodic gaps at varying interruption rates (logarithmic steps; 0.625, 1.25, 2.5, 5, 10 and 20 Hz), by modulating the time-modified sentences with a periodic square-wave signal. The square-wave period depended on the interruption rate, e.g., an interruption rate of 5 Hz produced a period of 200 ms, with an ON and OFF phase, both with an equal length of 100 ms (duty cycle of 50 %). A raised cosine ramp of 5 ms was placed at each onset and offset to prevent audible distortions due to spectral splatter.

FIG. 2
figure 2

Schematic representation of stimulus construction. Blue and red lines represent speech and filler noise, respectively. At step 1, the speech rate of the original sentence recordings was altered, by slowing down (speech rate=0.5) or speeding up (speech rate=2) with a factor of 2. At step 2a, periodic silent interruptions were inserted in the sentences at various interruption rates (0.625, 1.25, 2.5, 5, 10 and 20 Hz). At step 2b, the silent gaps were filled with the noise bursts.

To measure the phonemic restoration benefit, in half of the conditions, noise was used to fill the silent gaps. With the filler noise, the interrupted speech stream is more likely to be perceived as a continuous perceptual object, but simultaneously an ambiguity is introduced to the brain. It cannot tell if parts of speech are indeed missing or if they are simply masked. This seems to induce perceptual grouping mechanisms, helping the brain to form an object from audible speech samples and thereby perceptually filling in for missing speech (Shahin et al. 2009). Hence, the noise-filled conditions induce top-down repair mechanisms, and the improved intelligibility resulting from adding noise to silent gaps is taken as a measure of perceptual restoration benefit. The filler noise was a speech-shaped steady noise created by averaging the spectra of the male speech stimuli and randomizing the phase of the average spectrum (Versfeld et al. 2000). By inverting the phase of the same square-wave modulating function and applying it to the filler noise, the filler noise bursts were produced. These bursts were added to the interrupted speech and filled the silent gaps.

In all conditions, with or without the filler noise, the speech presentation level was fixed at 60 dB SPL, and the filler noise presentation level at 70 dB SPL, producing a signal-to-noise ratio of −10 dB SPL (Powers and Wilcox 1977; Başkent 2010, 2012).

Experimental Conditions

To ensure that the effects observed in the experiment were not caused by low baseline scores due to speech rate manipulation or some other age-related factors, a baseline performance was measured at three speech rates (speech rate=0.5, 1, 2) without any interruptions in sentences. All subjects of the study had a baseline score close to ceiling performance (Fig. 3). Next, the interruption conditions (0.625, 1.25, 2.5, 5, 10 and 20 Hz) were applied to the three speech rates. For slow speech, 10 and 20 Hz were excluded, and for normal and fast speech 0.625 Hz, to have speech segment durations more comparable across different conditions (based on a pilot study). All 14 conditions were tested twice, once with and once without the filler noise, producing 31 trials of 13 sentences each, including the three baseline measurements.

FIG. 3
figure 3

Intelligibility scores and restoration benefit shown for young and older groups per speech rate. Speech intelligibility scores, averaged for each age group, are shown for interrupted speech with silent gaps (top panels) and with filler noise (middle panels). The lowest panels directly show the restoration effect in percentage points (pp), calculated by taking the difference in scores from top and middle rows. In the lowest panels, all values above 0, shown by the solid, horizontal gray lines, denote a restoration benefit, and if the benefit was significant, it is further marked by a filled symbol. Results with slow, normal and fast speech rates are shown in the left, middle and right columns, respectively. SR stands for speech rate. In all panels, the age effect was tested with post hoc tests, and the significant effects are denoted by ‘*’ for the corresponding conditions. The leftmost symbols in the upper panels (indicated with the letters ‘b’ on the x-axes) show the baseline performances with uninterrupted original sentences, which were identical between the age groups. This made sure that the difference in data between the young and older groups was indeed due to experimental manipulations, and not some other inherent age-related factor. Error bars show ±1 standard error.

Procedure

Each condition was tested with one unique, randomly selected sentence list. Before the presentation of the first test sentence, the same introductory sentence ‘Buiten is het donker en koud’ (‘Outside it is dark and cold’) was played to indicate the beginning of a new list. Because the introductory sentence was processed the same way as the manipulations of the particular condition it also served to prime the participants for the specific manipulation that was about to be tested. To indicate the start of each sentence, participants heard a short beep preceding the stimulus.

After listening to each stimulus, the task of the participants was to reconstruct the sentences and formulate them into meaningful, correct Dutch sentences and to verbally report these. Guessing was encouraged to ensure that participants could report what they thought they heard even when they were not sure. Participants could also report only parts of sentences when they were unable to create entire meaningful sentences. Scoring was done online by the experimenter (first author), who sat outside the testing booth and listened to the participants’ responses through a headphone via the digital audio recorder. An annotation program developed in Matlab (version 7.10.0.499, 32-bit) was used. A Matlab GUI showed the list number and sentence number and could advance to the next sentence when a participant finished reporting what he/she heard. The experimenter was unaware of which experimental condition was being tested. After each condition/list the program would automatically calculate the correct percentage of annotated words with respect to the total words of all sentences in the list used in a particular trial. Each session was also recorded with the digital audio recorder for offline annotation to double check potential errors in the online annotation. The annotation rules were in line with the rules mentioned by Başkent (2012).

After the screening and briefing of participants, and following the baseline measurements (randomly assigned to lists 1–3), a short training session without feedback of one condition (interruption rate of 5 Hz and speech rate of 1) with and without filler noise was provided, with each randomly assigned to lists 32 and 33. After training, each one of the 28 interruption conditions was tested with a randomly assigned list from lists 4–31. All participants completed the experiment in a single session, which lasted on average 1.5 h, including the initial screening and occasional breaks.

RESULTS

Figure 3 shows the average scores for intelligibility of interrupted speech with silent gaps (top panels), for interrupted speech combined with filler noise (middle panels), and for the compensatory restoration benefit (lower panels) per subject group. The columns from left to right show the effect of changing the speech rate from slow to fast. Firstly,Footnote 1 for each age group and speech rate, it was determined whether the addition of noise increased intelligibility (i.e., the restoration benefit) by performing separate repeated-measures analyses of variance (RM-ANOVAs) with the within-subjects factor of interruption rate and the factor that represents the addition of noise (Table 1). Secondly, to determine the age effect on restoration and overall intelligibility for each speech rate (all nine panels of Fig. 3), the data in each panel was analyzed with a separate RM-ANOVA with the between-subjects factor of age and within-subjects factor of interruption rate (Table 2). Thirdly, to determine the effect of changing speech rate on restoration, per age group, RM-ANOVAs were performed with within-subject factors of speech rate and the interruption rates that overlapped between speech rates (Table 3). For each RM-ANOVA, sphericity was tested with Mauchly’s Test of Sphericity, and when sphericity was not assumed, degrees of freedom were adjusted using the Greenhouse–Geisser epsilon correction. Age effects were examined in more detail using Tukey’s HSD post hoc tests.

TABLE 1 Statistical analyses for restoration benefit per age group
TABLE 2 Statistical analyses for age effects on overall intelligibility of interrupted sentences and the restoration benefit
TABLE 3 Statistical analyses for speech rate effects on restoration benefit per age group

At normal speech rate, regarding the restoration benefit per age group, RM-ANOVAs indicated significant restoration only for the older group (p = 0.009; lower middle panel of Fig. 3 and middle column of Table 1). Nonetheless, post hoc tests showed that both age groups displayed significant restoration benefit at slow interruption rates of 1.25 (p < 0.001 for the young and p = 0.009 for the older group) and 2.5 Hz (p = 0.01 for the young and p < 0.001 for the older group). There was a significant main effect of age on overall intelligibility for both versions of the interrupted speech, with silent gaps (p < 0.001) and with filler noise (p = 0.027) (upper and middle panels of middle column of Fig. 3 and Table 2). This indicates that intelligibility of interrupted speech in general was significantly lower for the older group. Particularly with speech with silent gaps, young adults outperformed the older adults at 2.5 (p < 0.001) and 5 Hz (p = 0.002). Regarding the effect of age on restoration, there was no significant main effect for age (lower middle panel of Fig. 3 and Table 2). Hence, with normal speech rate in general, the older adults obtained comparable restoration benefit to the young adults. Interestingly, the benefit obtained by the older group was significantly larger than that of the younger group at 2.5 Hz (p = 0.015).

With slowed-down speech, both young (p = 0.008) and older adults (p < 0.001) showed significant restoration benefit (lower left panel of Fig. 3 and left column of Table 1). More precisely, young adults showed significant restoration at 1.25 (p < 0.001) and 2.5 Hz (p = 0.040), and older adults at 0.625 (p = 0.043), 1.25 (p < 0.001) and 2.5 Hz (p < 0.001). Similar to normal-rate speech, the main effects for age showed that overall intelligibility was lower for older than young adults for interrupted speech with silent gaps (p = 0.001) and with filler noise (p = 0.031) (Fig. 3 and Table 2, upper and middle panels of left column). Young adults outperformed the older adults with speech with silent gaps at 1.25 (p = 0.037) and 2.5 Hz (p < 0.001) and with filler noise at 1.25 Hz (p < 0.001). Further, regarding the age effect on restoration, there was no significant main effect for age (p = 0.057; lower left panel of Fig. 3 and Table 2). Post hoc tests indicated that the older adults obtained more benefit than young adults at 2.5 Hz (p < 0.001). Slowing down speech increased restoration benefit significantly only for the older adults (p = 0.038; lower left panel of Table 3).

With speeded-up speech, instead of a positive effect from restoration benefit, there was a negative effect of adding noise (lower right panel of Fig. 3 and right column of Table 1). Similar to slow- and normal-rate speech, overall intelligibility was higher for young adults for interrupted speech with both silent gaps (p < 0.001) and filler noise (p < 0.001) (upper and middle panels of right column of Fig. 3 and Table 2). Without noise, young adults outperformed the older adults specifically at 5 (p = 0.012), 10 (p < 0.001) and 20 Hz (p < 0.001), and with noise at 2.5 (p = 0.002), 5 (p < 0.001), 10 (p = 0.013) and 20 Hz (p < 0.001). There was no age effect on restoration.

DISCUSSION

Overall, the results show that older adults can benefit from the top-down repair mechanisms involved in phonemic restoration in a robust way, similar to young adults. This finding is a nice surprise, given that the intelligibility of interrupted speech, especially with silent gaps, was worse for older than younger listeners (Fig. 3 and Table 2, top and middle rows), an observation in line with previous studies with similar speech manipulations (Bergman et al. 1976; Gordon-Salant and Fitzgibbons 1993). Despite lower scores for interrupted speech overall, the relative improvement due to restoration was comparable to (and sometimes even better than) that of the younger group. Hence, the phonemic restoration ability is preserved with advanced age despite potential cognitive deterioration of fluid intelligence associated with old age that could have worked against it (Salthouse 1996; Kemper et al. 2003; Humes et al. 2006; Gazzaley et al. 2008; Kennedy and Raz 2009; Charlton et al. 2010; Wild-Wall and Falkenstein 2010; Janse 2012; Tonoki and Davis 2012; Swenor et al. 2013). This shows that older people seem to be able to compensate for the age-related decrement in perception of degraded speech, probably by relying on their gained knowledge and experience with language (Cattell 1971; Baltes 1993; Park et al. 2002; Salthouse 2004; Pichora-Fuller 2008).

Previously, poorer restoration was observed with older listeners with hearing loss (Başkent et al. 2010; Başkent 2010). Our new findings imply that the reduced restoration must have been mainly due to hearing impairment, a sensory factor that was eliminated in this study, and not age per se. Supporting this idea, even with young participants, compensatory restoration also diminished when they were tested with simulations of hearing devices (Başkent et al. 2009; Başkent 2012; Benard and Başkent 2013a; Bhargava et al. 2013). Specifically, when the bottom-up speech signal lacks the appropriate speech features that can induce top-down linguistic processes, such as lexical activation, inserting noise in the silent gaps may make the speech sound more continuous, but may not necessarily increase the intelligibility (Miller and Licklider 1950; Bhargava et al. 2013). These observations combined indicate that the top-down restoration processes depend on the state of bottom-up speech signals, in line with speech perception models that emphasize the interactive nature of the bottom-up and top-down processes for speech perception (Wingfield et al. 2005; Davis and Johnsrude 2007; Sheldon et al. 2008; Stenfelt and Rönnberg 2009; Sohoglu et al. 2012; Başkent 2012; Goy et al. 2013). In short, based on previous literature regarding phonemic restoration and hearing loss (Başkent et al. 2009, 2010; Başkent 2010, 2012), it might be that even though older adults obtain lower intelligibility than young adults for interrupted speech in general, they nonetheless benefit equally from restoration if the speech signal is not further degraded as would be the case with hearing impairment or hearing devices.

In this study, an additional manipulation, namely the altering of speech rates, was introduced. This was done to further explore potential effects of age-related cognitive slowing (Salthouse 1996). Slowing down speech increased the restoration benefit by the older adults, while speeding it up made the benefit disappear for both participant groups. This observation further confirmed that processing speed plays an important role for understanding degraded speech and phonemic restoration. Slowed speech seems to give the older adults more time to process noisy speech and use available cues from the speech signal more effectively.

Previously, compensation effects have usually been shown with selected high-performing older adults (Buckner 2004; Hedden and Gabrieli 2004). For this study, no such pre-selection or testing of cognitive skills of participants occurred other than screening for normal hearing. This study shows that older listeners can use top-down restoration to enhance intelligibility of degraded speech. Based on previous studies, it suggests that older people likely use supportive, sentential context better (Pichora-Fuller 2008), effectively utilizing their lifelong experience with language and accrued word knowledge (Cattell 1971; Baltes 1993; Park et al. 2002; Salthouse 2004). Our findings are also in line with the idea that older people may use central cognitive functions differently (Reuter-Lorenz 2002; Buckner 2004; Hedden and Gabrieli 2004; Wong et al. 2009; Grady 2012) or exert more mental effort (McCoy et al. 2005; Getzmann and Falkenstein 2011). Nonetheless, further research can help to identify the precise factors that help older people to compensate, which in turn could lead to new training methods that can help them to learn to perform better. Benard and Başkent (2013a, 2013b) showed that training improved perception of interrupted speech, indicating that people are able to learn to use the top-down repair mechanisms more effectively. Hence, directed cognitive training might help older adults to overcome cognitive deficits in old age (Mahncke et al. 2006; Anderson et al. 2013) and better cope with the complex listening environments of everyday life.