Subcortical Plasticity Following Perceptual Learning in a Pitch Discrimination Task
- 1.5k Downloads
Practice can lead to dramatic improvements in the discrimination of auditory stimuli. In this study, we investigated changes of the frequency-following response (FFR), a subcortical component of the auditory evoked potentials, after a period of pitch discrimination training. Twenty-seven adult listeners were trained for 10 h on a pitch discrimination task using one of three different complex tone stimuli. One had a static pitch contour, one had a rising pitch contour, and one had a falling pitch contour. Behavioral measures of pitch discrimination and FFRs for all the stimuli were measured before and after the training phase for these participants, as well as for an untrained control group (n = 12). Trained participants showed significant improvements in pitch discrimination compared to the control group for all three trained stimuli. These improvements were partly specific for stimuli with the same pitch modulation (dynamic vs. static) and with the same pitch trajectory (rising vs. falling) as the trained stimulus. Also, the robustness of FFR neural phase locking to the sound envelope increased significantly more in trained participants compared to the control group for the static and rising contour, but not for the falling contour. Changes in FFR strength were partly specific for stimuli with the same pitch modulation (dynamic vs. static) of the trained stimulus. Changes in FFR strength, however, were not specific for stimuli with the same pitch trajectory (rising vs. falling) as the trained stimulus. These findings indicate that even relatively low-level processes in the mature auditory system are subject to experience-related change.
Keywordsauditory training FFR F0 discrimination evoked potentials
Sounds whose waveforms repeat periodically elicit a sensation of pitch, which plays a fundamental role in the perception of speech and music, as well as in the segregation of concurrent sound sources (Plack and Oxenham 2005a). Auditory nerve fibers “phase lock” to the waveform of pure tones, that is, neural firing tends to occur at the same time during each cycle of the sound wave. Several models of pitch assume that this temporal information, which is preserved in upper brainstem structures for fundamental frequencies (F0s) up to several hundred hertz (Liu et al. 2006), is used to encode pitch (e.g., Meddis and O'Mard 2006; McLachlan 2009).
Musicians (Wong et al. 2007; Bidelman et al. 2009) and speakers of a tone language (Krishnan et al. 2005, 2009a, b; Swaminathan et al. 2008) show more robust subcortical “phase locking” in response to pitch-evoking sounds compared to English speakers without musical experience. This has been demonstrated using scalp recordings of the frequency following response (FFR), an evoked potential which reflects neural phase locking of brainstem nuclei (inferior colliculus and lateral lemniscus, Smith et al. 1975; Gardi et al. 1979) to the envelope of a sound. The enhancement of the FFR in musicians and speakers of a tone language has been explained as a result of subcortical plasticity driven by the extensive practice these populations of listeners have in identifying and discriminating sounds on the basis of pitch. However, FFR differences between expert and naive pitch listeners may be caused by factors other than neural plasticity (Monaghan et al. 1998), such as genetic predispositions. Moreover, assuming that FFR enhancements in expert pitch listeners reflect subcortical plasticity, it remains unclear whether this plasticity is limited to a critical developmental period, or is retained also in adulthood. A more direct approach to the study of neural plasticity consists in measuring neural activity before and after a period of training. The only study using this approach in adults found more robust FFR phase locking to the waveform of one out of three tones in a group of English speakers following a period of training in an auditory identification task with pseudo-words with the same pitch contour as Mandarin tones (Song et al. 2008). The lack of a control group in this study, however, does not allow the unequivocal conclusion that FFR changes between the pre- and post-training recordings were due to auditory training per se. One purpose of the present study was to provide a more rigorous test of the experience-dependent plasticity of the FFR. To this end, we compared FFR changes between a group of adult listeners following an auditory training protocol of ten 1-h sessions and a control group that did not receive any training. FFR enhancements in Mandarin speakers have been found to be specific for stimuli with the same pitch contour as Mandarin tones (Xu et al. 2006; Krishnan et al. 2009a). To assess the specificity of FFR training effects with respect to pitch contour, we trained participants with one of three stimuli respectively with a rising, falling, or static pitch contour, and assessed FFR changes after training for all three stimuli. Changes in performance in the behavioral discrimination task were also assessed for all stimuli.
Thirty-nine participants (16 females, 35 right handed, two ambidextrous) completed the experiment. The participants ranged in age between 19 and 35 years (mean = 23, SD = 2). They all had normal hearing for both ears with absolute pure tone thresholds below 20 dB HL at octave frequencies from 250 to 8000 Hz. None had prior experience in psychoacoustic tasks or musical training. All participants gave written informed consent and were paid an hourly wage for their participation in the experiment. All procedures of the study were approved by the Department of Psychology Ethics Committee, Lancaster University.
For the behavioral sessions, the stimuli were generated digitally with 32-bit resolution and a 48-kHz sampling rate on a Macintosh workstation. The stimuli were played through an M-AUDIO Firewire 410 DAC and presented binaurally via Sennheiser HD580 headphones. For the FFR sessions, all stimuli were generated digitally with 16-bit resolution and a 40-kHz sampling rate. The stimulus files were played through a DAC included in the evoked potentials data acquisition system (Intelligent Hearing Systems–Smart EP), and presented binaurally through mu-metal shielded ER2 insert earphones. Binaural FFRs have greater amplitude than FFRs recorded monaurally (Clark et al. 1997). The signal-to-noise ratio for binaural FFRs should therefore be greater than for monaural FFRs, and allow for more accurate FFR measurement.
Participants were randomly assigned to one of three experimental groups (G-Up, n = 9; G-Down, n = 9; G-Static, n = 9) or to a control group (G-Control, n = 12). All participants took part in a preliminary session during which they were familiarized with the stimuli and procedures of the experiment by running two blocks of the discrimination task for each stimulus. An additional familiarization block for each stimulus was run during the first behavioral discrimination thresholds session. Pre-training FFRs and behavioral discrimination thresholds were then measured in separate successive sessions. During the training phase, participants of the experimental groups ran ten sessions of the auditory discrimination task. During each training session, participants of the G-Up and G-Down groups completed 20 blocks of the discrimination task with the S-Up and S-Down stimuli, respectively, while participants of the G-Static group completed 18 blocks of the discrimination task with the S-Static stimulus. The mean duration of the training phase was 27 days. Participants of the control group waited for a similar amount of time (mean 32 days) without receiving any training. After the training phase, FFRs and behavioral thresholds were measured again in three separate sessions.
Pre- and post-testing thresholds for the discrimination task were measured with a three-interval, three-alternative forced-choice task, using an adaptive procedure. On each trial, three observation intervals separated by 500 ms, each containing a harmonic complex tone, were presented. For the static stimulus, two observation intervals (standard intervals), chosen randomly, were assigned a complex with a fixed F0 of 140 Hz, the other observation interval (comparison interval) contained a complex tone of a lower F0 which was varied adaptively. For the dynamic stimuli, two observation intervals (standard intervals), chosen randomly, were assigned a complex with a fixed FM duration of 400 ms, the other observation interval (comparison interval) contained a complex tone with a shorter FM duration, which was varied adaptively. The listener was asked to indicate by a key press on a numeric keypad which of the tones sounded different from the other two (odd-one-out task). Feedback was always provided at the end of each trial through the presentation of a colored light on the computer screen. For the static stimulus, a two-down one-up adaptive rule tracking the 70.7% correct point on the psychometric function was used (Levitt 1971). The percentage F0 difference between the complex tones in the standard and comparison observation intervals was initially set at 20%, and was increased (after an incorrect response) and decreased (after two consecutive correct responses) by a factor of 2 for the first four turnpoints and by a factor 1.414 thereafter. The maximum percentage F0 difference allowed was 80%. Sixteen turnpoints were measured for each block of trials and the threshold estimate was taken as the geometric mean of the last 12. For the dynamic stimuli a modified two-down one-up adaptive rule was used. The percentage FM duration difference between the complex tones in the standard and comparison observation intervals was initially set at 50%, and was increased after an incorrect response, and decreased after two consecutive correct responses, by a factor of 2 for the first four threshold estimation points and by a factor of 1.414 thereafter. The threshold estimation points were either turnpoints, or points at which the listener had given an incorrect response after reaching the FM duration difference limit (99%). A block was terminated after 16 threshold estimation points were collected. The threshold estimate was taken as the geometric mean of the last 12 of such points. For both static and dynamic stimuli, auditory discrimination thresholds for each stimulus were computed as the geometric mean of the threshold estimates measured in five blocks of trials. The change in performance across the pre- and post-thresholds assessment sessions was quantified as the ratio of the pre- to post-testing threshold.
Participants reclined comfortably in a double-walled sound-attenuating booth. They were instructed to relax and refrain from extraneous body movements. The electroencephalogram (EEG) was recorded differentially between gold-plated scalp electrodes placed on the midline of the forehead at the hairline and the seventh cervical vertebra. Another electrode placed on the mid-forehead served as the common ground. The interelectrode impedances were maintained below 1 kΩ. The EEG signal was recorded with an 8 kHz sampling rate, bandpass filtered from 50 to 3,000 Hz, and amplified by a factor of 150,000. The stimuli were played with a repetition rate of two per second and were presented in blocks of 256, in alternating polarity (half in rarefaction and half in condensation polarity). The sum of the waveforms recorded in opposite polarities was used for the analyses. Epochs with voltage changes exceeding 29 μV were automatically discarded and the trial repeated. Seven blocks were repeated for each stimulus. The order of the blocks was randomized. The online EEG activity was monitored and if the EEG was noisy during a certain block, it was noted and discarded from subsequent analyses. Either the last six blocks, or the six blocks remaining after discarding blocks with noisy recordings, were used for the analyses. The overall duration of a session, including electrode placement, was about 1 h and 30 min. The FFR waveforms were bandpass filtered offline with digital finite impulse response filters between 50 and 1,900 Hz. The high-frequency cutoff was chosen in order to ensure that harmonic components generated by the transducers were not contaminating the recording
The changes in behavioral thresholds (ratio of the thresholds measured at the pre- and post-threshold assessment session) were log transformed to improve the normality of the data. Before computing test statistics, means and standard deviations of each dependent measure (change in behavioral thresholds, FFR strength), were computed for each combination of stimulus (S-Up, S-Down, S-Static) per group (G-Up, G-Down, G-Static, G-Control). Data points falling beyond ±2 standard deviations of the group mean for a given stimulus were considered outliers. All the data of a participant with one or more outliers in a given dependent measure were discarded from the analyses of that dependent measure only. Outliers were present only in the FFR measure for four participants (one in the G-Up group, two in the G-Static group, and one in the G-Control group). All comparisons were planned, except where explicitly stated, and the reported p values are uncorrected. When the test statistic involved a t test between independent samples, the Fligner–Killeen test of the homogeneity of variances between the two groups (Conover et al. 1981) was first performed. In the case of unequal variances between the two groups the Welch–Satterthwaite approximation to the degrees of freedom (Satterthwaite 1946) was applied. Since the expected direction of change for the dependent measures was known, all the t tests were run as one-tailed tests, except where explicitly stated. For the analysis of the correlations between the behavioral and physiological measures, we employed a non-parametric procedure (Spearman’s rank correlation) that does not rely on the assumption of normally distributed data. Since the expected sign of the correlations was known, their significance was tested with one-tailed t tests.
The DL changes over the training sessions displayed in Figure 3 suggest that both the groups trained on the FM duration discrimination task and the group trained on the F0 discrimination task showed a protracted decrease in thresholds across the training sessions. For the groups trained on the FM duration discrimination task this was confirmed by a repeated-measures analysis of variance (ANOVA) on the log-transformed FM duration DLs, with SESSION (1–10) as within and GROUP (G-Up, G-Down) as between-subjects factors. This analysis revealed a significant main effect of SESSION [F(9,144) = 4.363, p < 0.001], while the main effect of GROUP [F(1,16) = 0.138, p = 0.715] and the GROUP x SESSION interaction [F(9,144) = 0.663, p = 0.741] were not significant. These results indicate that thresholds decreased over the training sessions for both the G-Up and G-Down groups. The results of a univariate repeated-measures ANOVA on the log-transformed F0 DLs for the group trained on the F0 discrimination task revealed a significant effect of SESSION [F(9,72) = 10.363, p < 0.001] as well.
Threshold changes between the pre- and post-testing sessions
Plasticity at the subcortical Level
Correlations between behavioral and FFR changes
We found changes in subcortical electophysiological responses to sounds after a multiple-hour period of pitch discrimination training in adults. These results provide direct evidence of short-term subcortical plasticity in adults. This plasticity consisted of more robust phase locking of the FFR to the static or dynamic F0 of the trained stimuli. More robust FFR phase locking to the F0 of a stimulus can reflect either a greater accuracy of phase locking of single fibers to the F0 of the stimulus, or a greater proportion of fibers phase locking to the stimulus period. The latter may be achieved either through recruitment of additional fibers phase locking to the stimulus period or through the inhibition of fibers firing at different periods. Inhibitory and excitatory circuits local to the brainstem (Yang and Pollak 1997; Burger and Pollak 1998) may mediate such changes in phase locking selectivity.
The specificity of the FFR enhancements we found with respect to pitch shape (dynamic vs. static) suggests that different mechanisms may be affected by training with static and dynamic pitch contours. FFR enhancements in Mandarin speakers are greater for tonal segments with a dynamic pitch contour (Krishnan et al. 2009b; Swaminathan et al. 2008). The identification of Mandarin tones requires the discrimination between different shapes of F0 contours rather than the discrimination of static F0 contours differing in F0 height. Our results are consistent with the idea that long-term practice with dynamic pitch stimuli in Mandarin speakers may affect FFR mechanisms specific to dynamic pitch contours (Krishnan and Gandour 2009). We did not find evidence, however, that for dynamic pitch stimuli, FFR enhancements are specific to the pitch trajectory (specificity for the rising vs. falling stimulus). FFR enhancement specificity for particular pitch trajectories has been previously found in Mandarin speakers (Krishnan et al. 2009a). Interestingly, the present behavioral results showed specificity of learning with respect to pitch trajectory. This suggests that the representation of the rising and falling pitch stimuli was differentially affected by learning at a higher processing level than the one probed by the FFR. It is possible that such specificity of learning in high-level stimulus representations guides, during long-term learning, the specificity of FFR plasticity observed in Mandarin speakers.
We also found that changes in behavioral performance in the pitch discrimination tasks correlated with changes in FFR strength for the stimuli with a rising and static pitch contour. These correlations suggest that increases in FFR strength may contribute directly to improvements in the perception of the stimuli. The proportion of variance in the changes in behavioral thresholds explained by the changes in FFR strength was relatively small. The fact that a number of participants performed close to floor level in the first behavioral threshold assessment session may have reduced the strength of the observed correlations. Nonetheless, it is likely that improvements in a perceptual discrimination task reflect improvements in stimulus encoding at several levels of sensory processing (Ahissar and Hochstein 2004). Moreover, improved perceptual discrimination may reflect improvements of other perceptual processes involved in the discrimination task, such as attentional selection of task relevant information (Goldstone 1998; Amitay 2009). The results of our study suggest that improvements in the encoding of the stimuli at the level of the brainstem make a small but significant contribution to short-term pitch discrimination learning.
Subcortical plasticity in the auditory system
The sensitivity of the FFR to multiple-hour auditory discrimination training shows that the human auditory system is susceptible to plasticity at a relatively peripheral level of sensory processing even in adulthood. Previous reports of plasticity in the adult human auditory system have been generally limited to cortical measures of auditory processing. There is, however, increasing evidence that short-term training (de Boer and Thornton 2008; Song et al. 2008) or short-term sensory deprivation (Munro and Blount 2009) can modify subcortical measures of auditory processing even in adulthood. The results of our study complement a growing body of research indicating that long-term experience in the discrimination and identification of pitch-evoking stimuli, obtained through the acquisition of a tone language or musical practice, modifies auditory processing at the level of the brainstem (see Tzounopoulos and Kraus 2009; Krishnan and Gandour 2009 for reviews). These studies have shown that speakers of tone languages (Krishnan et al. 2005, 2010) and musicians (Wong et al. 2007; Bidelman et al. 2009) have enhanced subcortical phase locking to the envelope of periodic sounds in comparison to English speakers without musical experience. In speakers of a tone language, these effects are present for both speech-like and non-speech stimuli, and are greater for tonal segments with highly accelerated F0 contours (Xu et al. 2006; Swaminathan et al. 2008; Krishnan et al. 2009b), which are characteristic of tone languages (Eady 1982). These effects have been measured in native speakers of a tone language, and musicians who started practicing during childhood. It is known that certain forms of neural plasticity are possible only during limited critical developmental periods at a young age (Hensch 2004). These include the postnatal reorganization of tonotopic maps in the auditory cortex of rats (Chang and Merzenich 2003; de Villers-Sidani et al. 2007), the alignment of auditory space maps in the barn owl midbrain to altered visual space maps (Knudsen et al. 2000), and the proficient acquisition of language after cochlear implantation in humans (Manrique et al. 1999; Harrison et al. 2005). The results of our study show that FFR plasticity is not limited to a critical period during childhood, although it is still possible that FFR plasticity is greater during early development. The larger size of the FFR enhancement effects observed in speakers of a tone language (e.g., Krishnan et al. 2005) compared to the present study is compatible with this hypothesis. However these differences may also be due to differences in the amount of “training”, which can be measured in terms of years for native tone language speakers, and only hours for the participants of our study. The correlations found by Wong et al. (2007) between FFR measures of pitch processing for some tones, and years of musical training in musicians, suggest that although experience-dependent FFR enhancements effects can be measured only after a few hours of training, they build-up progressively with further training.
In principle, neurophysiological differences between different categories of listeners may also be explained by factors other than neural plasticity (Monaghan et al. 1998). The results of our study suggest that neural plasticity is likely to be the cause of the FFR enhancements in “pitch experts” (musicians and speakers of a tone language) that have been reported by previous studies. A comprehensive understanding of interindividual differences in pitch-related abilities and their neurophysiological correlates, however, requires taking into account the possible contribution of other factors to these differences, such as genetic predispositions. It has been shown, for example, that the ability to identify incorrect tones in familiar melodies is a highly heritable trait, with genetic factors explaining up to 80% of variability in this ability (Drayna et al. 2001). Moreover, there is evidence that the adoption of tone languages is associated with the frequency in the population of specific alleles of two genes related to brain growth, and this association is hard to explain by geographical or historical factors (Dediu and Ladd 2007). Studies comparing pitch-processing measures between different populations of listeners cannot disentangle the contribution of neural plasticity from the contribution of genetic factors to the differences measured. Studies comparing pitch-processing measures before and after a period of auditory training, as well as studies comparing pitch-processing measures between populations which are unlikely to have systematic genetic differences, but differ in their experience with pitch discrimination (e.g., Chinese Mandarin speakers vs. Chinese English speakers) can best identify the contribution of neural plasticity to pitch processing abilities.
The exact mechanisms underlying the plasticity of the FFR response remain currently unknown. In a recent study, de Boer and Thornton (2008) found that activity of the medial olivocochlear bundle, which sends efferent signals from the brainstem to the cochlea, and is part of the corticofugal efferent system, reflected improvements on a speech in noise discrimination task after a period of auditory discrimination training. The corticofugal system, that projects from the auditory cortex to all major brainstem nuclei (Winer 2006), is likely to play a crucial role in subcortical plasticity. Bajo et al. (2010) have shown that ferrets subject to monaural sensory deprivation cannot relearn to localize sound accurately after damage to the corticocollicular pathway, which is part of the corticofugal system. Studies in other non-human species have demonstrated short-term changes in frequency selectivity of neurons in the inferior colliculus and cochlear nucleus after a period of auditory fear conditioning or focal electrical stimulation of the auditory cortex (Suga and Ma 2003; Suga 2008; Luo et al. 2008). Subcortical plasticty elicited by electrical stimulation of the auditory cortex can be explained only by the activation of the corticofugal system. It is possible that the subcortical plasticity observed in the present experiment depends on similar tuning of F0 specificity in neurons under the influence of descending projections.
We would like to thank R. P. Carlyon, K. Mattock, M. Turgeon, and two anonymous reviewers for constructive comments. S. Carcagno was supported in part by an EPSRC doctoral training award.
- Bidelman GM, Gandour JT, Krishnan A. (2009) Cross-domain effects of music and language experience on the representation of pitch in the human auditory brainstem. J Cogn Neurosci (in press)Google Scholar
- Conover W, Johnson M, Johnson MA (1981) Comparative Study of Tests for Homogeneity of Variances, with Applications to the Outer Continental Shelf Bidding Data. Technometrics 23:351–361Google Scholar
- Eady SJ (1982) Differences in the F0 patterns of speech: tone language versus stress language. Lang Speech 25:29–42Google Scholar
- Plack CJ, Oxenham AJ (2005) The present and future of pitch. In: Plack CJ, Oxenham AJ (eds) Pitch: neural coding and perception. Springer, New YorkGoogle Scholar
- Plack CJ, Oxenham AJ (2005b) The Psychophysics of Pitch. In Plack CJ, Oxenham AJ (eds) Pitch: neural coding and perception Springer, New YorkGoogle Scholar
- Satterthwaite FE (1946) An Approximate Distribution of Estimates of Variance Components. Biometrics Bulletin 2:110–114Google Scholar