Effects of enriched auditory experience on infants’ speech perception during the first year of life
Infants rapidly learn language in their home environments. Between 6 and 12 months of age, infants’ ability to process the building blocks of speech (i.e., phonetic information) develops quickly, and this ability predicts later language development. Typically, developing infants in a monolingual language environment rapidly tune in to the phonetic information of their native language, while their sensitivity to nonnative phonetic information starts to decrease. Yet, enriched experience to a new language during this time significantly improves infants’ sensitivity to the sound contrasts used in that language when compared to a control group without exposure to the new language. More recently, a new study examined another type of enriched auditory experience—musical experience—to determine its effect not only on music processing but also on phonetic processing. Results showed that a 1-month laboratory music intervention focusing on rhythm learning enhanced 9-month-old infants’ neural processing not only for music but also for speech. Together, these results suggest that these enriched auditory experiences in infancy may improve infants’ general auditory pattern-detection skills and their sensitivity to phonetic information.
KeywordsInfants Speech perception Language experience Music intervention Learning
In many species, the young are particularly sensitive to environmental inputs at certain periods during development. The barn owl’s ability to localize prey is calibrated by auditory-visual input during an early sensitive period in development; wearing prisms (or ear plugs) alters the mapping during this period (Knudsen 2002). Binocular fusion is dependent on binocular visual input during a critical period early in development; rearing cats with one occluded eye irreversibly alters binocular representation in the visual centers of the cortex (Hubel and Wiesel 1977; Shatz and Stryker 1978). In songbirds, learning their species-typical song depends on experience during a critical temporal window; presentation of conspecific song during that time is essential for normal development (Konishi 1985; Marler 1970). A recent theoretical paper (Werker and Hensch 2015) discusses the nature of the “critical” periods, especially the biological factors that “open” and “close” them. Here, we review work from our laboratory that focuses on one specific time period for human infants’ learning; namely, the “sensitive period” for phonetic learning and the experiential factors that may influence this learning process. We first discuss the developmental trajectory of infants’ abilities to discriminate native and nonnative phonetic contrasts between 6 and 12 months of age, and then several experiential factors we have observed in laboratory studies that influence infants’ ability to discriminate speech sounds during this sensitive period. Lastly, we discuss future directions for research that will help elucidate the mechanisms through which these experiential factors exert their influences.
Early phonetic learning
Social influences on phonetic learning during the sensitive period
After exposure, researchers tested all 4 groups on Mandarin phonetic discrimination. The results from behavioral tests (conditioned head-turn, see Kuhl et al. 2006) on infants after exposure demonstrated that only the group exposed to Mandarin in a social context by live humans learned the Mandarin contrast. The data demonstrated two things: (a) phonetic learning from first-time exposure can occur at 9 months of age, and (b) phonetic learning from natural language exposure during the sensitive period requires social interaction. Similar second-language exposure experiments using Spanish explored both phonetic and word learning, as well as the degree to which social factors, such as visual attention, during the exposure sessions predict individuals’ learning. Using brain measures (event-related potential, ERP, measures; see Kuhl et al. 2008), the results with Spanish replicated previous findings using Mandarin; additionally, they show that English phonetic discrimination does not decline—in fact, it increases, as expected, as Spanish contrast learning increases (Conboy and Kuhl 2011). Moreover, analyses of the video records revealed a significant positive relationship between infants’ social skills—which allowed them to shift gaze between the foreign-language tutor and the toys as the tutor held new toys and named them in the foreign language—and increased neural responsiveness to the Spanish contrast (Conboy, Brooks, Meltzoff, and Kuhl 2015). These correlations between social responses and brain measures of learning buttress the argument that infants’ social skills are coupled to language learning.
The data on infant speech-perception reviewed above suggest that infants are very sensitive to social language input during the period between 6 and 12 months. Infants’ sensitivity is so high that even a foreign language introduced for the first time at 9 months causes robust phonetic learning when it is delivered in a social context. This leads to the hypothesis that the mechanisms underlying infant speech-perception are somehow “tuned” to language input, delivered socially, during this time. The corollary hypothesis is that only language input can influence these mechanisms at this time.
A recent experiment suggests that the corollary hypothesis must be altered. In the next section, we review the results of an experiment that exposes infants to music in a way that is similar to previous experiments using foreign-language interventions during the sensitive period (Conboy and Kuhl 2011; Kuhl, Tsao, and Liu 2003). In the music intervention, researchers exposed infants to a particular rhythmical structure in music, the triple meter (the waltz), for 12 sessions in a social context, using a randomized control design. The control group experienced similar activities in a social setting, but no music. After 12 sessions, the research team tested both intervention and control infants with violations of rhythmic structure in both music and speech. The results show effects on both music and speech, and reveal activation in the infants’ auditory-sensory and prefrontal cortices. In the remaining sections, we detail these findings and discuss their implications.
Effects of music intervention on infants’ phonetic learning
During the last decade, music training that starts early in development has received increasing attention in the science community as an important early experience, given the growing amount of evidence suggesting the robust and extensive training-related benefits in auditory, language, and cognitive abilities (Kraus and Chandrasekaran 2010; Shahin 2011; Zatorre 2013). Previous studies—using various methodologies, including behavioral, electrophysiological, and neural imaging methods—have demonstrated repeatedly that musically trained adults and children exhibit enhanced processing of musical information (e.g., musical pitch and meter) in comparison to nontrained groups (Fujioka, Ross, Kakigi, Pantev, and Trainor 2006; Geiser, Sandmann, Jancke, and Meyer 2010; Habibi, Cahn, Damasio, and Damasio 2016; Koelsch, Schroger, and Tervaniemi 1999; Pantev et al. 1998; Vuust et al. 2005; Zhao and Kuhl 2015a, b).
More importantly, prior studies have also demonstrated generalization effects in the trained individuals from their early musical experience to other domains, one of the most studied being speech processing. The ability to accurately and efficiently process complex speech sounds is critical in language development as speech processing in infants can robustly predict language abilities in early childhood (see “Early phonetic learning” section); and, at the same time, studies have shown that developmental language disorders (e.g., dyslexia, specific language impairment) have origins in auditory processing deficits (Goswami 2011; Tallal and Gaab 2006). So far, researchers have found that musically trained adults and children can better encode the acoustic details in speech at the level of the brainstem, especially when speech is embedded in noise (Bidelman, Weiss, Moreno, and Alain 2014; Parbery-Clark, Skoe, Lam, and Kraus Parbery-Clark et al. 2009; Parbery-Clark, Tierney, Strait, and Kraus 2012; Strait, Parbery-Clark, O’Connell, and Kraus 2013). At the cortical level, researchers observed musically trained individuals to better process pitch information in both native and foreign speech compared to nonmusicians; one study focusing on the temporal information in speech demonstrated that adult musicians could track syllable structures in words better as well (Magne, Schon, and Besson 2006; Marie, Magne, and Besson 2011; Marques, Moreno, Castro, and Besson 2007; Wong, Skoe, Russo, Dees, and Kraus 2007). These cross-domain effects from early music training to speech perception raise theoretically interesting and important questions about different levels of processing (e.g., lower-level acoustic processing vs. higher-level cognitive skills) affected by early experience and how they can support these observed generalization effects (Kraus and Chandrasekaran 2010).
Following this growing literature, we examined the rich experience of music training in an even earlier developmental stage (9 months of age) for both theoretical and methodological reasons (Zhao and Kuhl 2016). Theoretically, this approach allowed us to compare the effects of music experience during the sensitive period of phonetic learning to other previously studied experiences, such as experience of a foreign language (Kuhl, Tsao, and Liu 2003). Methodologically, (1) we were able to randomly assign infants at this age to complete either a structured laboratory-controlled music intervention (Intervention) or control activities (Control). This approach allowed controlling for effects related to predispositions (e.g., genetics), prior music experience, and the variability in individuals’ music training (e.g., onset, nature, and duration of the music experience); (2) we focused on temporal information processing, which has less experimental data regarding effects derived from early music training. In this study, the Intervention targeted infants’ learning of a specific meter (triple meter—e.g., waltz) and we tested the effects of the Intervention on both music (metrical structure) and speech (syllable structure); (3) we used neural responses, measured by magnetoencephalography (MEG), as outcome measures to compare Intervention and Control infants in the spatial and temporal aspects of their cortical responses.
We predicted enhancement in both music and speech domains, following the rationale that the Intervention—targeting infants’ learning of a specific meter—exerts influence at a higher level of processing. We argued that the Intervention infants would become better at extracting the temporal pattern of complex sounds over time, leading to their ability to make more robust predictions about the timing of future stimuli based on the extracted temporal structure—an ability that would affect both music and speech processing.
The design of the Intervention/Control sessions paralleled our prior studies in the laboratory on infant speech learning at 8–10 months of age (see “Social influences”). Specifically, we recruited 9-month-old infants raised in monolingual English-speaking environments with comparable prior and concurrent music listening experiences at home, whose parents were not performing musicians. We randomly assigned infants to the Intervention or Control group for 12 sessions (15 minutes each), over a 4-week period, of corresponding activity in the laboratory.
In the Intervention/Control sessions, we incorporated several key components to maximize infants’ learning specific to the Intervention while reflecting naturalistic infant music classes: (1) Intervention infants experienced various infant tunes and songs only in triple meter (e.g., waltz). We selected triple meter as the target temporal structure because studies have shown that it is a more difficult temporal structure in Western music for infants to process at this age than duple meter (e.g., marching music) (Bergeson and Trehub 2006), yet infants can rapidly learn temporal patterns in the music of their culture (Gerry, Faux, and Trainor 2010; Hannon and Trehub 2005a, b); (2) Intervention infants, with the aid of caregivers, tapped out the musical beats with maracas or their feet, and their caregivers often bounced them in synchronization to the musical beats—activities that are common in infant music classes and effective in infants’ learning of temporal structure (Phillips-Silver and Trainor 2005); (3) the Control sessions offered comparable visits to a laboratory, familiarity with the laboratory environment, levels of social interaction with other infants and caregivers, and levels of motor activity and engagement, but without music. For example, infants, aided by their parents, played with toy cars, blocks, and other objects that required coordinated movements, such as moving and stacking; (4) in both the Intervention and Control sessions, researchers engaged infants in a social setting with 1–2 other infants and their caregivers, a setting demonstrated in previous work to be effective when infants are exposed to a foreign language (Kuhl, Tsao, and Liu 2003). Experimenters facilitated each session by engaging the infants and their caregivers in the activities to a comparable degree.
Our results supported our hypotheses and answered our specific questions, demonstrating that: (1) the Intervention group exhibited a larger MMR response to violations in temporal structure for music (i.e., triple meter) when compared to the Control group; (2) the effects were observed in both temporal (auditory) and prefrontal regions of the cortex (Figure 4b, c); (3) the enhancement in temporal structure processing generalized to the speech domain, reflected by a larger MMR in temporal and prefrontal cortical regions in response to violations of a foreign temporal structure in the Intervention group (Figure 5b, c).
We therefore demonstrated that a short-term laboratory-controlled music intervention at 9 months of age that reflects naturalistic infant music classes affects not only infants’ functional processing of temporal structure in music but also—more importantly—infants’ processing of syllable structure in speech. We based our prediction of the generalization effects from the Intervention to speech on the rationale that infants would learn to better attend to and extract auditory patterns in the temporal domain, allowing them to generate—from learned patterns—more robust predictions about the timing of future events. Our results thus strongly supported the idea that such enriched music intervention experience may support the development of a broader set of perceptual skills.
The design of the Intervention, as well as the use of foreign syllable structure, in the MEG testing in this study allows us to compare the current results to our previous experiments examining the effects of foreign-language intervention during this sensitive period of phonetic learning. In the next section, we discuss in more detail the implications of the result showing enhanced sensitivity to foreign syllable structure contrasts.
Summary and discussion
In this article, we have introduced the concept of what we term a “sensitive period” for infants’ phonetic learning between the age of 6 and 12 months (Kuhl 2004). Decades of research have demonstrated that infants’ ability to discriminate native speech contrasts improves, in contrast to their ability to discriminate nonnative speech contrasts that decreases during this period (Kuhl et al. 2006; Werker and Tees 1984). Further, we discussed that infants’ phonetic learning during this sensitive period is highly malleable, depending on the auditory input infants receive at that time. The skill to discriminate nonnative speech contrasts provides a window for us to study how inputs during the sensitive period can affect infants’ phonetic learning. In a series of studies, we demonstrated that experience with a foreign language could enhance infants’ ability to discriminate the nonnative speech contrasts in that language. More importantly, language experience during this time needs to be social in nature—the same input delivered through a TV screen did not result in learning (Conboy and Kuhl 2011; Kuhl, Tsao, and Liu 2003). Yet, in our most recent study, we show that a music intervention targeting rhythm learning during this sensitive period also enhanced infants’ ability to discriminate a nonnative speech contrast that is based on syllable structure differences.
How does the enriched auditory experience of foreign language and music exert its influence on infants’ phonetic learning during the sensitive period for phonetic learning? Previous research has demonstrated the influences of cognitive skills on speech perception in this period; 11-month old monolingual infants show a strong negative correlation between specific cognitive controls skills (inhibitory control) and nonnative speech discrimination (Conboy, Sommerville, and Kuhl 2008; Diamond, Werker, and Lalonde 1994; Lalonde and Werker 1995). The authors’ interpretation is that infants with good inhibitory control skills are better able to ignore speech sounds that are irrelevant to their native language, and, therefore, that they exhibit lower nonnative speech discrimination skills, which has been shown to correlate with faster native-language growth (Figure 2; Kuhl et al. 2008). On the other hand, literature on infants and children raised in bilingual language environments demonstrate enhanced cognitive flexibility compared to their monolingual counterparts (Bialystok and Craik 2010; Kovács and Mehler 2009a, b). We, therefore, speculate that an enriched auditory experience (i.e., foreign language and music) provides complex yet patterned auditory input; when delivered in a social setting, it allows infants to develop enhanced cognitive abilities to switch between inputs and attune their attentional resources to the relevant and important auditory information.
One specific mechanism by which infants can learn to effectively allocate attentional resources is predictive coding. By extracting the temporal pattern of input, the dynamic attending theory posits that attentional resources are allocated to time windows during which the brains predict that important information will occur (e.g., musical beats, syllables) (Jones and Boltz 1989). Investigators have demonstrated that infants as young as 3 months of age are able to extract temporal patterns and predict future stimuli based on the extracted information (Basirat, Dehaene, and Dehaene-Lambertz 2014; Emberson, Richards, and Aslin 2015). Our recent data using complex auditory stimuli suggest that a music intervention focusing on temporal information learning may have increased infants’ ability to extract high-level temporal patterns and generate stronger predictions about future stimuli—a skill that they can apply both in music and in speech processing. Future research is warranted to, first, establish the relationships between different general cognitive skills (e.g., inhibition, flexibly switching attention) and infants’ ability to discriminate native and nonnative speech sounds. Then, it will be critical to directly test whether short-term language or music experience, in comparison to no exposure, affects these cognitive skills—which can, in turn, affect phonetic learning during the “sensitive period”. In the longer term, researchers should dissect and systematically examine the various components of these enriched auditory experiences (e.g., social elements, multi-model elements) in order to evaluate the effectiveness of each element and the interactions among them. This will not only enhance our theoretical understanding of infant phonetic learning but will also inform the design of early-education interventions, especially for infants at risk for communication disorders.
- Bekinschtein, T. A., Dehaene, S., Rohaut, B., Tadel, F. O., Cohen, L., & Naccache, L. (2009). Neural signature of the conscious processing of auditory regularities. Proceedings of the National Academy of Sciences of the United States of America, 106(5), 1672–1677. doi: 10.1073/pnas.0809667106.CrossRefGoogle Scholar
- Diamond, A., Werker, J. F., & Lalonde, C. (1994). Toward understanding commonalities in the development of object search, detour navigation, categorization, and speech perception. In G. Dawson & K. W. Fischer (Eds.), Human behavior and the developing brain. New York, NY: Guilford Press.Google Scholar
- Emberson, L. L., Richards, J. E., & Aslin, R. N. (2015). Top-down modulation in the infant brain: Learning-induced expectations rapidly affect the sensory cortex at 6 months. Proceedings of the National Academy of Sciences of the United States of America, 112(31), 9585–9590. doi: 10.1073/pnas.1510343112.CrossRefGoogle Scholar
- Kuhl, P. K., Conboy, B. T., Coffey-Corina, S., Padden, D., Rivera-Gaxiola, M., & Nelson, T. (2008). Phonetic learning as a pathway to language: New data and native language magnet theory expanded (NLM-e). Philosophical Transactions of the Royal Society B-Biological Sciences, 363(1493), 979–1000. doi: 10.1098/rstb.2007.2154.CrossRefGoogle Scholar
- Kuhl, P. K., Tsao, F. M., & Liu, H. M. (2003). Foreign-language experience in infancy: Effects of short-term exposure and social interaction on phonetic learning. Proceedings of the National Academy of Sciences of the United States of America, 100(15), 9096–9101. doi: 10.1073/pnas.1532872100.CrossRefGoogle Scholar
- Marler, P. (1970). Birdsong and speech development: Could there be parallels? Am Sci, 58(6), 669–673.Google Scholar
- Vuust, P., Pallesen, K. J., Bailey, C., van Zuijen, T. L., Gjedde, A., Roepstorff, A., et al. (2005). To musicians, the message is in the meter: Pre-attentive neuronal responses to incongruent rhythm are left-lateralized in musicians. Neuroimage, 24(2), 560–564. doi: 10.1016/j.neuroimage.2004.08.039.CrossRefGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.