Numerous studies have shown that “ideal” listeners—that is, young, normal-hearing, highly educated listeners—can adapt to idiosyncratic pronunciations through lexically guided perceptual learning in speech perception (McQueen, Cutler, & Norris, 2006; Norris, McQueen, & Cutler, 2003; for an overview, see Samuel & Kraljic, 2009), and are thus able to tune in to a speaker to understand him or her better. The lexically guided perceptual learning effect has been shown using a variety of exposure and test paradigms—for instance, lexical decision and phonetic categorization (e.g., Norris et al., 2003), short story presentation and phonetic categorization (e.g., Eisner & McQueen, 2006), and a picture verification procedure (e.g., McQueen, Tyler, & Cutler, 2012). In the exposure phase, listeners are exposed to an idiosyncratic sound—for instance, a sound ambiguous between [s] and [f] (/f/s/), which would be learned as /s/ if it was heard in words such as platypus (because platypus is an existing word in English, whereas platypuf is not), but as /f/ in words such as giraffe (which is an existing word in English, whereas giras is not). This perceptual-learning effect is caused by a temporary change in phonetic category representations, rather than by changes in decision bias (Clarke-Davidson, Luce, & Sawusch, 2008). Perceptual learning has been found for tones (Mitterer, Chen, & Zhou, 2011) and for different types of sounds—for instance, stops, which differ in voice onset times (/t/ vs. /d/: Kraljic & Samuel, 2007); fricatives, which differ in noise spectra (/s/ vs. /ʃ/: Kraljic & Samuel, 2005, 2007; /s/ vs. /∫/: Eisner & McQueen, 2006; McQueen et al., 2006; Norris et al., 2003; Sjerps & McQueen, 2010); liquids, which differ in liquid spectra (/l/ vs. /r/: Scharenborg, Mitterer, & McQueen, 2011); and vowels (McQueen & Mitterer, 2005).

But what about less “ideal” listeners? Are other listener groups besides university students—for instance, young children and older listeners—also capable of perceptual learning? Or, in other words, which listener characteristics actually relate to lexically guided perceptual learning? McQueen et al. (2012) recently showed that 6- and 12-year-olds are also capable of perceptual learning, so even before children are able to read (6-year-olds), they are able to use lexical knowledge to adjust phoneme categories. In the present research, we focused on the flexibility of the speech perception system in an older population. More specifically, our research investigated lexically guided perceptual learning by older listeners by comparing it to perceptual learning by younger listeners. Moreover, we investigated whether a link exists between lexical behavior during exposure and the strength of the perceptual-learning effect. The strength of the perceptual-learning effect can manifest itself in two ways: The magnitude or size of the effect is indicated by the difference between the response curves of the two exposure groups in the perceptual-learning experiment. On the other hand, the duration or stability of the perceptual-learning effect manifests itself as the presence (or absence—i.e., unlearning) of the perceptual-learning effect over time. These two aspects of perceptual-learning strength can be assumed to covary. If phoneme categories are less flexible, it may take more time and/or more exposure for a category change to come about. Once the category is changed, however, it would also take longer to undo this change, resulting in the relative stability of the change. Thus, listeners whose phoneme categories are difficult to change would be expected to show a smaller and more stable perceptual-learning effect than would listeners whose phoneme categories are more flexible.

We analyzed the time course of accepting the odd-sounding items as real words during a lexical-decision exposure task for younger and older listeners. On the one hand, if the rate of acceptance were (relatively) high but did not change over exposure, this could simply be a result of a generally greater tolerance of acoustic ambiguity, while low acceptance could be due to the ambiguous stimuli not being perceived as ambiguous. On the other hand, if greater acceptance of the odd-sounding items emerged over the time course of the lexical-decision task, this could suggest that the items started to sound less odd. This greater acceptance over exposure trials would then indicate the importance of accepting the odd-sounding items as real words for the emergence of the perceptual learning effect. Second, we investigated whether differences in the frequency of acceptance of the odd-sounding items as words would result in differences in the amounts that people shifted their phoneme categories. It might be that listeners who accepted more of the odd-sounding items as words during exposure would show larger category boundary shifts during testing. However, at the same time, participants could also be tolerant of “odd pronunciations” during lexical decision, from the start of the experiment, and leave their categories relatively unaltered. To our knowledge, no lexically guided perceptual learning study has ever directly investigated the link between performance during exposure and perceptual learning.

Aging may particularly affect sensitivity to the higher frequencies in the speech signal, which could result in the loss of sensitivity to phonetic detail. This loss of sensitivity to speech detail may then affect the ability to learn nonstandard pronunciations, as evidence in favor of a certain pronunciation variant would be weaker. Nevertheless, short-term adaptation to accents and to time-compressed speech seems to be preserved with aging and with hearing loss (Adank & Janse, 2010; Golomb, Peelle, & Wingfield, 2007; Gordon-Salant, Yeni-Komshian, Fitzgibbons, & Schurman, 2010; Peelle & Wingfield, 2005). Kennedy, Rodrigue, Head, Gunning-Dixon, and Raz (2009) found that aging per se did not affect the magnitude of the learning gains in perceptual skill learning: Age only indirectly affected learning, via age-related declines in cognitive performance. The ability to adapt to various aspects of speech thus may remain relatively unaffected throughout one’s life. However, older adults have more language experience than younger adults do, which may make their phoneme categories more resistant to change than those of younger adults. Perceptual adjustment of phoneme categories and the conditions under which these adaptations occur in an older population have not yet been investigated.

In this research, we investigated the following questions: (1) Do older adults show perceptual learning effects of a similar size to those of younger adults? (2) Does lexical behavior during exposure predict the strength of the perceptual learning effect? Since we were interested in the flexibility of the speech perception system of older listeners, we aimed to minimize the effect of hearing loss. We therefore used the /l/–/r/ contrast, in which the distinguishing information between the two consonants is mostly in those frequency regions that are supposedly affected to a lesser extent by age-related hearing loss. To ensure that the results were indeed not caused by hearing sensitivity differences, we investigated the effect of hearing loss as a control variable. The main experiment consisted of two parts (following, e.g., Norris et al., 2003, and Scharenborg et al., 2011). First, the listeners were exposed to an ambiguous [l/ɹ] consonant in Dutch words ending in either /r/ or /l/ during a (self-paced) lexical-decision task (the exposure phase). Listeners were divided into two groups: One group was exposed to the ambiguous sound only in /l/-final words, and the other group was exposed to the ambiguous sound only in /r/-final words. In a subsequent (self-paced) phonetic categorization task (the test phase), the listeners were confronted with a range of ambiguous sounds from the [l]–[ɹ] continuum, which appeared as the final phoneme of a nonword, and they were asked to decide whether the sound was /l/ or /r/.

Method

Participants

All of the participants were native Dutch speakers drawn from the Max Planck Institute for Psycholinguistics subject pool and were paid for their participation. A group of 16 young, normal-hearing university students participated in a pretest. The “younger” experimental group consisted of 36 normal-hearing university students (28 female, eight male; mean age 21.2 years), while the “older” group consisted of 60 listeners 60+ years of age from the Nijmegen area (36 female, 24 male; mean age 71.5 years, age range 60–88; there was no age cutoff). The age groups differed in size because the older adults were compared to a group of younger adults who had been tested in a different project (Scharenborg et al., 2011). Furthermore, investigating the effects of age and hearing loss, which were only analyzed within the older-adult group, requires a larger sample size. The hearing sensitivity of the older listeners was assessed with a portable Maico ST 25 screening audiometer (air conduction thresholds only, for octave frequencies from 250 Hz–8 kHz) in a sound-attenuated booth. A pure-tone average threshold was computed as the average over the participants’ thresholds at 1, 2, and 4 kHz. Six of the participants wore hearing aids in their daily life, but they did not wear them during testing. The mean pure-tone average (PTA, in the participants’ better ear) was 25.8 dB HL (SD = 13.3, range 5.0–63.3); the higher the participants’ PTAs, the poorer their hearing sensitivity.

Materials

For the exposure phase, 200 Dutch words were selected from the CELEX database (Baayen, Piepenbrock, & Gulikers, 1995). Forty of the words ended in /l/, and 40 ended in /r/; no /l/s or /r/s occurred elsewhere in these 80 words. Since the sounds [l] and [ɹ] color the pronunciation of the preceding vowel, the vowel preceding [l] and [ɹ] was kept constant, such that all of the words ended in /əl/ or /ər/. The number of syllables was matched between the two sets of critical items (i.e., /l/-final and /r/-final words). Of the words, 25 had two syllables (e.g., ezel “donkey”), ten had three syllables (e.g., postzegel “stamp”), and five had four syllables (e.g., sinaasappel “orange”). Appendix A provides a full listing of all 80 of the critical items. Word frequency and stress patterns were matched between the two sets as far as possible.

A total of 120 words were selected as fillers, and 200 filler nonwords were also constructed. Both sets of fillers followed the same syllable-length distribution as the critical items. /l/ and /r/ did not occur in any of these items. The nonwords followed Dutch phonotactic rules and tended to become nonwords before their final phonemes.

All of the words were produced in isolation by a female native speaker of Dutch (from the Western part of the Netherlands) and were digitally recorded in a sound-attenuated booth at 44 kHz. She also recorded the nonwords kwiptel and kwipter for use in the test phase.

Creating the ambiguous stimuli

From the natural recordings, versions of the 80 critical words were created with ambiguous final sounds substituted for /l/ and /r/. These ambiguous [l/ɹ] sounds and the test continuum for the phonetic categorization task were selected using a phonetic categorization pretest. The selection of the ambiguous sounds was done separately for each final syllable type present in the full set of 80 critical items for the lexical-decision task. A total of 11 different final /Cəl/ɹ/ sequences were presented among the set of 80 words. Note that due to devoicing of fricatives in Dutch, syllables beginning with /s/ and /z/ were treated as the same sequence, and likewise for /f/ and /v/ and for /x/ and /ɣ/. The subset of words used for the pretest—that is, one pair of words for each of the 11 sequences—is listed in Appendix B (with their English translations and nonword counterparts).

For each pair of words (e.g., winkel and wekker), the final syllable was excised using Praat (Boersma & Weenink, 2005). All of the excised [l] and [ɹ] final syllables were zero-padded with 25 ms of silence at onset and offset to allow for valid pitch estimation at the start and end of the syllable. Subsequently, each syllable received the same stylized pitch contour (based on the naturally occurring pitch contour of the final syllables in the critical items) using Praat. The resulting pairs of syllables were then each morphed to create equally spaced 11-step continua using STRAIGHT (Kawahara, Masuda-Katsuse, & Cheveigne, 1999) in MATLAB. Figure 1 shows the ambiguous syllable [kəl/ɹ] (top and third panels). This syllable was Step 5 on the morphed continuum between the natural versions of [kəl] (second panel) and [kəɹ] (bottom panel). The ambiguous syllables were then concatenated, using Praat, as final syllables onto the first syllables of the matching /l/-final and /r/-final words; for instance, the morphs for /kəl/ɹ/ were concatenated with both /wɪŋ/ (yielding winkel) and /wε/ (yielding wekker).

Fig. 1
figure 1

The top panel shows the acoustic signal for the zero-padded ambiguous syllable [kəl/ɹ], and the bottom three panels show, respectively, the spectrograms for the natural version of [kəl], the ambiguous [kəl/ɹ], and the natural version of [kəɹ]

The pretest stimuli were presented in three blocks, each consisting of 132 items, with a newly randomized order in each block. In each block, the 16 pretest participants heard six [l]–[ɹ] continuum steps (Steps 1, 3, 4, 6, 7, and 9) for each of the 11 syllables. These steps were chosen to sample perception of the entire continuum (excluding the endpoints). Since each morph was concatenated with both an /l/- and an /r/-final word, each morph was heard twice per block.

The task for the participants was to indicate by buttonpress as quickly and as accurately as possible whether they heard [l] or [ɹ]. To aid the participants, the /l/ interpretation of the stimulus was shown on the bottom left of the computer screen, and the /r/ interpretation of the stimulus was on the bottom right. If the /l/ interpretation was a word, the right option was a nonword, and vice versa. This procedure ensured that participants did not receive consistent lexical guidance about how to interpret the stimuli (i.e., the same ambiguous sound would receive lexical support for an /l/ interpretation on one trial and an /r/ interpretation on a different trial), so that phoneme category retuning would not take place. Appendix B shows the word and nonword pairs that were used. Each stimulus was presented over headphones 500 ms after trial onset.

Due to an error in the testing software, the pretest for the [fəl]–[fəɹ] morphs had to be done separately. Six participants each heard ten repetitions of each [fəl]–[fəɹ] morph, and the rest of the experimental setup was identical to the main pretest.

The total proportions of /r/ responses to each of the tested morphs were calculated, and the most ambiguous morph was determined for each of the 11 syllables. The most ambiguous morph for syllables starting with /k, x, b/ was Step 5 (with Step 0 being a natural [l] and Step 10 a natural [ɹ]); for /m, d, f/, it was Step 3; for /t, ŋ, n/, it was Step 6; for /p/, it was Step 2; and for /z/, it was Step 7. However, after testing another separate group of six younger participants on the lexical-decision task, the results showed that most of the /l/ words ending in ambiguous [l/ɹ] were not recognized as words. For the actual experiment, the ambiguous morphs were therefore changed to the next most [l]-like step: Step 4 for /k, x, b/; Step 2 for /m, d, f/; Step 5 for /t, ŋ, n/; Step 1 for /p/; and Step 6 for /z/. The selected morphs were then concatenated as final syllables onto the nonfinal syllables of the matching /l/-final and /r/-final words, as had been done to create the stimuli for the pretest. This resulted in 80 stimulus pairs consisting of the same word ending either in a natural [l] or [ɹ] or in the selected ambiguous [l/ɹ]. These stimuli were then used in the lexical-decision task.

The stimuli used in the test phase consisted of five versions of kwipte l/ɹ. These were created by concatenating five different versions of the ambiguous [l/ɹ] sound as final syllables onto the first syllable kwip (excised from a recording of the nonword kwipter). The steps (i.e., Steps 2, 4, 5, 6, and 8, where Step 5 was the ambiguous sound used in the lexical-decision task) were taken from the [təl]–[təɹ] continuum created for the pretest. Both the /l/-final and /r/-final readings of the resulting string are nonwords in Dutch.

Procedure

Two experimental word lists were created in which the test items appeared in a pseudorandomized running order. The restrictions were that no critical item (i.e., no word ending in [l/ɹ]) was allowed to appear in the first six words, and no two critical items could appear within a range of four words. Each list consisted of 400 words—that is, the 200 nonwords, 120 filler words, and 40 words ending in a natural [l] or [ɹ]—and 40 critical items—that is, the /r/-final or /l/-final words ending in [l/ɹ]. The difference between the two word lists was that one list contained only the natural /r/-final words and /l/-final words ending in [l/ɹ], and the other list contained only the natural /l/-final words and the /r/ words ending in [l/ɹ]. The younger and older listener groups were split into two groups and assigned one of the two experimental word lists. In one group, 18 younger and 30 older listeners heard the ambiguous /l/-final words during exposure, and in the other, 18 younger and 30 older listeners heard the ambiguous /r/-final words during exposure.

Participants were tested individually in a sound-treated booth. The stimuli were presented binaurally over closed headphones, and participants were asked to press a button as quickly and accurately as possible when they heard a word (left button) or a nonword (right button). They were not informed about the presence of the ambiguous sounds. The lexical-decision task lasted approximately 25 min. Subsequently, the participants were tested using a phoneme categorization test. They were asked to decide as quickly and accurately as possible, by buttonpress, whether the stimulus ended in /l/ or in /r/. The five ambiguous kwipte l/ɹ stimuli were each presented six times per block and were newly randomized for each of a total of three blocks (90 test items in total). The /l/ interpretation of the stimulus (kwiptel) was shown on the bottom left of the computer screen, and the /r/ interpretation (kwipter) on the bottom right. The phonetic categorization task lasted approximately 8 min. All of the stimuli for all participants were presented at an average intensity level of 75 dB SPL.

Results

We investigated whether older listeners would show a perceptual learning effect of a similar size to that shown by younger listeners, by means of an age group comparison. To that end, we investigated performance and the time course of accepting ambiguous items as words in the lexical-decision task, and the perceptual learning effect as exhibited in the phonetic categorization task. The final analyses focused on whether age and hearing sensitivity predict perceptual learning.

All of the analyses were carried out using generalized linear mixed-effect models (e.g., Baayen, Davidson, & Bates, 2008), containing both fixed and random effects, using the logit link function. The fixed and random factors differed in the analyses and are therefore listed for each analysis separately. The parameters of the generalized linear models were set using maximum likelihood estimation using dummy coding. A generalized model has the form

$$ \mathrm{logit}\;p=c+{\beta_1}\mathrm{Facto}{{\mathrm{r}}_1}+{\beta_2}\mathrm{Facto}{{\mathrm{r}}_2}+{\beta_3}\mathrm{Facto}{{\mathrm{r}}_3}+\ldots, $$

where logit p represents log [p(1 – p)], logit p is the “dependent variable,” and the constant c is the intercept. The different βs (Chatterjee, Hadi, & Price, 2000) represent the relevance (effect size) of the different predictors for the estimation of logit p. In each analysis, a best-fitting model was built using the fixed and random variables. We started by building the most complex model—that is, the model with all possible interactions between the predictors. Subsequently, interactions and predictors that proved not to be significant were removed step by step from the model. The best-fitting model only contained predictor variables and interactions that were significant. We only report statistically significant effects and the absolute estimated values of the different βs, with an explanation of the found effect.

Lexical decision

In accordance with Norris et al. (2003), participants who judged fewer than 20/40 of the [l/ɹ] items as being words were excluded from further analyses, due to “poor” performance. This resulted in the exclusion of one participant from the younger group, who heard natural /l/-final words and ambiguous /r/-final words, and nine participants from the older group (none of whom wore hearing aids in daily life; four heard natural /r/-final and ambiguous /l/-final words, and five heard natural /l/-final and ambiguous /r/-final words), leaving 51 older listeners in the analyses.

The percentages of “yes” responses for the nonword filler items were 1.8 % for the younger listeners and 2.2 % for the older listeners. Table 1 shows the mean percentages of “yes” responses for the natural and ambiguous versions of the /l/-final and /r/-final words for the listeners who were exposed to the ambiguous sound in either /r/-final or /l/-final words. Listeners from both age groups accepted most of the stimuli ending in [l/ɹ] as words. Note that, in many cases, listeners did not need the final phoneme to recognize the word: Many of the multisyllabic words used in the present study became unique before the final phoneme.

Table 1 Performance on the lexical-decision task, as mean percentages of “yes” responses for the natural and ambiguous versions of the /l/- and /r/-final words

The responses to the natural liquids, filler words, and ambiguous items were subsequently analyzed statistically. We investigated the question of whether listeners immediately accept ambiguous items as words or become more tolerant over trials. The critical issue here was the time course of accepting an ambiguous target stimulus as a word during the lexical-decision task. We therefore only focused on the items that could potentially elicit a “yes it is a word” response, thus ignoring the nonword items. The responses of the two exposure groups were taken together, and two new categories were created: “natural,” containing the responses to the natural items in the two exposure groups (i.e., the natural /l/-final words in the group of listeners who learned to map the ambiguous sound onto /r/, and the natural /r/-final words in the group of listeners who learned to map the ambiguous sound onto /l/), and “ambiguous,” containing the responses to the ambiguous items in the two exposure groups. The third type of words was the word fillers (on the intercept). The dependent variable was whether the response to the word type was “yes” (coded as 1) or “no” (coded as 0). The fixed predictors were trial, word type (natural, ambiguous, or word filler), and age group (younger vs. older, with the former group being on the intercept). Items and Subjects were the random predictors.

Fewer “yes it is a word” responses were given to ambiguous items (89.6 %; β = −2.9283, SE = 0.4325, p < .001) than to the filler words (97.1 %) by both age groups. However, over trials, listeners from both age groups started to give more “yes” responses to the ambiguous items (β = 0.0064, SE = 0.0017, p < .001), so both the younger and the older listeners seemed to “learn” to accept the ambiguous items over the course of the experiment. Moreover, older listeners gave fewer “yes” responses to the natural items (97.2 %; β = −0.7978, SE = 0.3183, p < .05) than did the younger listeners (98.6 %). Fewer “yes” responses were given to the word fillers by the younger listeners over trials (β = −0.0023, SE = 0.0009, p < .01). Since we found no significant difference between the numbers of “yes” responses to word fillers and to words with natural liquids, we can conclude that younger listeners seemed to get less sure over trials about whether or not the natural items were words. An interaction between trial and age group showed that this growing uncertainty over trials occurred less for the older listeners (β = 0.0020, SE = 0.0008, p < .01). Perhaps the task itself made the younger listeners more cautious over trials, making them more uncertain on the natural items as well.

Summarizing, the results showed that both the younger and older listeners who were exposed to the ambiguous [l/ɹ] in the normally /l/-final words tended to interpret the ambiguous sound as /l/, whereas listeners who were exposed to [l/ɹ] in the context of normally /r/-final words interpreted [l/ɹ] as /r/. Moreover, younger and older listeners showed similar time-course effects on the lexical-decision task: The listeners grew more tolerant over trials, in that they accepted more ambiguous items as words over the course of the lexical-decision task.

Phonetic categorization

Figure 2 shows the proportions of /l/ and /r/ responses for the five ambiguous stimuli in the phonetic categorization task, separately for the three blocks. The responses for the listeners who were exposed to [l/ɹ] in the normally /r/-final words are indicated in the figure by “r”s for the younger listeners and “R”s for the older listeners. The responses for the listeners who were exposed to [l/ɹ] only in the normally /l/-final words are indicated by “l”s for the younger listeners and “L”s for the older listeners. The responses to the five ambiguous stimuli were subsequently analyzed (the dependent variable was whether the response was /l/, coded as 0, or /r/, coded as 1). The fixed predictors were Exposure Group (exposed to the ambiguous sound only in /l/-final words [on the intercept] or only in /r/-final words during the lexical-decision task), Age Group (older listener group on the intercept), and Block; stimulus step (a continuous variable with Step 3 on the intercept and steps not spaced linearly) was used as a control variable. Subject was the only random factor, as all ambiguous sounds were embedded in the same kwipte l/ɹ nonword context. In our report of the analysis, we focus on only those results that are relevant to the research question.

Fig. 2
figure 2

Total proportions of /r/ responses for the two exposure groups per test block: The “r” and “R” labels within the graphs indicate the groups of younger and older listeners, respectively, who learned to map [l/ɹ] onto [ɹ], and “l” and “L” indicate the groups of younger and older listeners who learned to map [l/ɹ] onto [l] for the five ambiguous test stimuli

Table 2 displays the parameter estimates in the best-fitting model of performance. Both age groups showed an effect of exposure group on phonetic categorization. In general, older listeners who were exposed to [l/ɹ] in the normally /r/-final words were strongly biased to label the sounds on the continuum as /r/, while those older listeners who were exposed to [l/ɹ] in the normally /l/-final words were less likely to do so. This difference between the curves of the /r/ responses for the two exposure groups (within each age group) is the perceptual learning effect. The magnitude of the difference between the mean proportions of /r/ responses made by listeners in the /r/-final and /l/-final exposure groups was stronger for younger listeners in Block 1 (18.5 % vs. 61.2 % for the /l/ vs. /r/ exposure groups, respectively) than for older listeners (35.6 % vs. 71.6 %, for the /l/ vs. /r/ exposure groups, respectively; shown by the Exposure Group × Age Group interaction); that is, the younger listeners not only showed a perceptual-learning effect, the magnitude of the effect was significantly larger for the younger than for the older listeners in Block 1. The difference in the magnitudes of perceptual learning between the age groups decreased in subsequent blocks, as witnessed by a three-way interaction between Block, Age Group, and Exposure Group (note that if we averaged the results over all test blocks, the sizes of the perceptual-learning effects between the two age groups did not differ). Table 3 shows the sizes of the perceptual learning effect per age group. Indeed, a per-block analysis showed that the interaction between Age Group and Exposure Group was no longer significant in the later test blocks (Block 2, β = −0.5216, SE = 0.9369, p > .5; Block 3, β = −1.4653, SE = 0.9882, p > .1). The younger listeners thus showed “unlearning,” while the learning effect for the older listeners remained stable over blocks.Footnote 1

Table 2 Lexically guided perceptual learning: Fixed-effect estimates for the best-fitting model of performance in the phonetic categorization task (n = 7,740)
Table 3 Differences in mean percentages of /r/ responses by the listeners in the /r/-final exposure group versus the /l/-final exposure group, for younger and older listeners

Summarizing, both the younger and older listener groups show perceptual learning of ambiguous sounds on the /l/–/r/ continuum. However, returning to our first research question: The perceptual-learning effect was larger right after exposure for younger listeners, while the effect was more stable for older listeners.Footnote 2

Predicting phonetic categorization performance from lexical-decision performance

In the following analyses, we investigated whether lexical behavior during exposure and age differences (among the older adults) predict the strength of the perceptual learning effect. The first analysis focused on whether differences in the frequency of acceptance of the odd-sounding items as words during the lexical-decision task resulted in differences in the amounts that people would shift their phoneme categories. To that end, we investigated whether listeners who more often judged ambiguous items to be words during the lexical-decision task gave more learning-consistent responses (i.e., more /r/ responses when exposed to the ambiguous sound in /r/-final words, and more /l/ responses when exposed to the ambiguous sound in /l/-final words). We focused on the ambiguous stimuli because these are the crucial items in the lexical-decision task that were supposed to induce phonetic category adjustment. In each age group, the two exposure groups were taken together, and a new “learning-consistent” category was created in which the /r/ responses during the phonetic categorization task among the group of listeners exposed to the ambiguous sound in /r/-final words and the /l/ responses of the group of listeners exposed to the ambiguous sound in the /l/-final words were combined. Moreover, we only analyzed data from the stimulus steps of interest (i.e., the most ambiguous steps: 2, 3, and 4). Percentages of ambiguous items accepted as words during the lexical-decision task were calculated for each participant, and used as a fixed predictor of whether the category response was learning consistent (the dependent variable, coded as 0 and 1, for not learning consistent and learning consistent, respectively). Age Group (younger listener group on the intercept), Stimulus Step, and Test Block were used as control variables; Subject was the random factor.

Table 4 displays the parameter estimates in the final model. Listeners who more often judged an ambiguous item as being a word in the lexical-decision task gave more learning-consistent responses during the phonetic categorization task; that is, they showed stronger perceptual learning than did listeners who less often judged an ambiguous item as being a word. These listeners thus seem to have retuned their phoneme categories more.

Table 4 Fixed-effect estimates for the best-fitting model of “learning-consistent” performance in the phonetic categorization task (n = 4,644)

Subsequently, the effect of age on the perceptual-learning effect was investigated among the older participants. Since hearing loss is a common phenomenon among older listeners, Hearing Loss was used as a control variable (centralized to the mean). The dependent variable was again the “learning-consistent” category, taking into account only the three most ambiguous stimulus steps. Age and hearing sensitivity were shown to be correlated (r = .40, p < .005). To reduce collinearity in the model, a residual was created for Age (with Hearing Loss partialed out), which was used as a fixed predictor. The other control variables were Stimulus Step and Test Block, and Subject and Items were the random factors.

Consistent with the age group comparison, in which the younger listeners initially had a larger perceptual learning effect than did the older listeners, the perceptual learning effect in the first block was smaller with increasing age (β = −0.1142, SE = 0.0385, p < .005). Moreover, the decrease in the perceptual learning effect over blocks was smaller with increasing age (β = 0.04473, SE = 0.0133, p < .001), which is also consistent with the results of the group comparison. Hearing sensitivity did not modify the size or stability of the perceptual learning effect, showing that our stimuli were indeed audible for listeners with hearing loss.

General discussion

Our research was inspired by numerous findings that young university students show lexically guided perceptual learning (for an overview, see Samuel & Kraljic, 2009). We focused on the flexibility of the speech perception system in an older population. More specifically, through this research we tried to answer the following two questions. First, do older adults show perceptual learning effects of a similar size to those of younger adults? This was investigated by comparing lexically guided perceptual learning of older listeners (age 60+) with that of younger listeners. Second, does lexical behavior during exposure predict the strength of the perceptual learning effect? To ensure that the results were not caused by age differences in hearing sensitivity, the effect of hearing loss was investigated as a control variable.

The perceptual learning experiment consisted of two parts: an exposure phase consisting of a lexical-decision task, and a phonetic categorization test phase. The lexical-decision results showed that both the younger and older listeners interpreted the majority of the stimuli with ambiguous final [l/ɹ] as being words. Moreover, the time courses of accepting the ambiguous items as words during exposure were similar for both age groups: Listeners in both age groups showed increased acceptance of ambiguous items as words over the course of the lexical-decision task. This finding shows that increased acceptance of the odd-sounding items as real words may reflect the perceptual learning effect.

Despite the similarity in the time courses of acceptance of ambiguous items over exposure, age-related differences did appear in the strength of the perceptual-learning effect: The effect was stronger right after exposure for the younger listeners, but was more stable for older listeners; that is, younger listeners showed “unlearning,” whereas older listeners did not. This age effect was confirmed in a subsequent analysis of the effect of age among the older adults; also, within the group of older listeners the perceptual-learning effect became smaller but also more stable with increasing age. Importantly, this different pattern of perceptual learning could not be explained by age-related differences in hearing loss, since no effect of hearing loss on the strength of the perceptual learning effect was found.

These findings raise the question of what “age” is or what it represents. As in the study by Kennedy et al. (2009), age per se might not have affected perceptual learning directly, but mainly indirectly, via age-related changes in cognitive or linguistic abilities that we did not explicitly measure. Older persons, for instance, have (much) more linguistic experience than younger persons. Perhaps language experience makes phonetic categories more robust and resistant to larger or faster changes for older adults, while younger adults may have sparser, more malleable categories. As we argued in the introduction, if category changes take more time and/or need more compelling evidence, undoing these changes would also take longer, resulting in relatively stable learning. Even though we did not find an age group difference in acceptance of the ambiguous words as words over all exposure trials, a subset analysis on the first two thirds of the 200 (ambiguous, natural, and filler) word exposure trials indeed showed stronger initial reluctance to accept the ambiguous stimuli as words among the older listeners, as compared to the younger listeners (β = −0.0180, SE = 0.0081, p < .05). This age difference in initial acceptance mirrors the age difference in phoneme category flexibility observed in the phonetic categorization task. Note, however, that the smaller but more stable perceptual retuning effect with increasing age was also found within the group of older adults. As one might expect linguistic experience to plateau in older age, this linguistic-experience account may not provide a full explanation.

Another explanation for the age difference in category adjustment could be an age-related decline in the efficiency of inhibitory processes (Hasher & Zacks, 1988; Mattys & Scharenborg, under review; Zacks & Hasher, 1994). Such a reduction of efficient inhibitory processes might affect the dynamics of spoken-word recognition by resulting in less deactivation of similar-sounding lexical candidates in older adults. Older listeners have indeed been shown to be more affected by competition from similar-sounding words than are younger listeners (cf. Ben-David et al., 2011; Sommers, 1996; Sommers & Danielson, 1999). We conjecture that by keeping more word candidates activated during the word recognition process, lexical guidance from the critical words may become less compelling, resulting in decreased lexically induced perceptual learning.

In addition to age, lexical behavior during the lexical-decision task also predicts the strength of the perceptual-learning effect: Listeners who more often gave “yes” responses to ambiguous items during the lexical-decision task showed stronger perceptual learning. In other words, people vary in the amounts that they shift their phoneme categories on the basis of lexical guidance. This provides evidence that it is generally not the case that participants are tolerant during exposure (accepting odd-sounding items as words), and yet leave their category boundaries unaltered. As far as we are aware, this is the first time that a link between the frequency of “yes” responses to ambiguous items during a lexical-decision task and the strength of the perceptual learning effect has been shown.

An unlearning effect for younger listeners was also reported by Mitterer et al. (2011) for the lexically guided perceptual learning of tones. Our results on perceptual learning of a consonant contrast with distributed (i.e., nonlocal) acoustic cues (as the consonant also affected the quality of the preceding vowel) and Mitterer et al.’s results on perceptual learning of tones whose cues are also nonlocal seem to suggest that perceptual learning of a contrast with nonlocal cues differs from learning of the plosive or fricative contrasts that have been used in other studies, which differ primarily in local acoustic cues. An explanation for why such differences between local and nonlocal cues would impact the stability of the learning effect is, however, lacking.

To conclude, older listeners, like younger listeners, show perceptual learning of a liquid contrast. Together with the results found by McQueen et al. (2012) for 6- and 12-year-olds, these data clearly show that the ability for lexically driven perceptual learning is present over the life span. Nevertheless, an age-related decline in the size of the perceptual-learning effect and an increase in its stability were observed, which may be accounted for by decreased flexibility in the adjustment of phoneme categories or by age-related changes in the dynamics of spoken-word recognition.