Distributional learning for speech reflects cumulative exposure to a talker’s phonetic distributions

  • Rachel M. Theodore
  • Nicholas R. Monto
Brief Report


Efficient speech perception requires listeners to maintain an exquisite tension between stability of the language architecture and flexibility to accommodate variation in the input, such as that associated with individual talker differences in speech production. Achieving this tension can be guided by top-down learning mechanisms, wherein lexical information constrains interpretation of speech input, and by bottom-up learning mechanisms, in which distributional information in the speech signal is used to optimize the mapping to speech sound categories. An open question for theories of perceptual learning concerns the nature of the representations that are built for individual talkers: do these representations reflect long-term, global exposure to a talker or rather only short-term, local exposure? Recent research suggests that when lexical knowledge is used to resolve a talker’s ambiguous productions, listeners disregard previous experience with a talker and instead rely on only recent experience, a finding that is contrary to predictions of Bayesian belief-updating accounts of perceptual adaptation. Here we use a distributional learning paradigm in which lexical information is not explicitly required to resolve ambiguous input to provide an additional test of global versus local exposure accounts. Listeners completed two blocks of phonetic categorization for stimuli that differed in voice-onset-time, a probabilistic cue to the voicing contrast in English stop consonants. In each block, two distributions were presented, one specifying /g/ and one specifying /k/. Across the two blocks, variance of the distributions was manipulated to be either narrow or wide. The critical manipulation was order of the two blocks; half of the listeners were first exposed to the narrow distributions followed by the wide distributions, with the order reversed for the other half of the listeners. The results showed that for earlier trials, the identification slope was steeper for the narrow-wide group compared to the wide-narrow group, but this difference was attenuated for later trials. The between-group convergence was driven by an asymmetry in learning between the two orders such that only those in the narrow-wide group showed slope movement during exposure, a pattern that was mirrored by computational simulations in which the distributional statistics of the present talker were integrated with prior experience with English. This pattern of results suggests that listeners did not disregard all prior experience with the talker, and instead used cumulative exposure to guide phonetic decisions, which raises the possibility that accommodating a talker’s phonetic signature entails maintaining representations that reflect global experience.


Speech perception Perceptual learning Computational models Distributional learning 



This work was supported by NIH NIDCD grant R21DC016141 to RMT and by the Acoustical Society of America Raymond H. Stetson Scholarship in Phonetics and Speech Science to NRM. The views expressed here reflect those of the authors and not the NIH or the NIDCD. We express gratitude to Stephen Graham for his assistance with data collection and to Emily Myers for fruitful discussion and feedback on this manuscript.

Supplementary material

13423_2018_1551_MOESM1_ESM.docx (245 kb)
ESM 1 (DOCX 244 kb)


  1. Clayards, M., Tanenhaus, M. K., Aslin, R. N., & Jacobs, R. A. (2008). Perception of speech reflects optimal use of probabilistic speech cues. Cognition, 108, 804-809.CrossRefGoogle Scholar
  2. Hillenbrand, J., Getty, L. A., Clark, M. J., & Wheeler, K. (1995). Acoustic characteristics of American English vowels. Journal of the Acoustical Society of America, 97(5), 3099-3111.CrossRefGoogle Scholar
  3. Holt, L. L. (2005). Temporally nonadjacent nonlinguistic sounds affect speech categorization. Psychological Science, 16(4), 305-312.Google Scholar
  4. Idemaru, K., & Holt, L. L. (2011). Word recognition reflects dimension-based statistical learning. Journal of Experimental Psychology: Human Perception and Performance, 37(6), 1939-1956.PubMedGoogle Scholar
  5. Kleinschmidt, D. F. (2017). beliefupdatr: Belief updating for phonetic adaptation. R package version 0.0.3.Google Scholar
  6. Kleinschmidt, D. F. (2018). Structure in talker variability: How much is there and how much can it help? Language, Cognition and Neuroscience.
  7. Kleinschmidt, D. F., & Jaeger, T. F. (2015). Robust speech perception: Recognize the familiar, generalize to the similar, and adapt to the novel. Psychological Review, 122, 148-203.CrossRefGoogle Scholar
  8. Kleinschmidt, D. F., & Jaeger, T. F. (2016). What do you expect from an unfamiliar talker? In A. Papafragou, D. Grodner, D. Mirman, & J. C. Trueswell (Eds.), Proceedings of the 38th Annual Meeting of the Cognitive Science Society (pp. 2351-2356). Austin, TX: Cognitive Science Society.Google Scholar
  9. Kraljic, T., & Samuel, A. G. (2005). Perceptual learning for speech: Is there a return to normal? Cognitive Psychology, 51(2), 141-178.CrossRefGoogle Scholar
  10. Kraljic, T., Samuel, A. G., & Brennan, S. E. (2008). First impressions and last resorts: How listeners adjust to speaker variability. Psychological Science, 19(4), 332-338.CrossRefGoogle Scholar
  11. McMurray, B., Aslin, R. N., & Toscano, J. C. (2009). Statistical learning of phonetic categories: Insights from a computational approach. Developmental Science, 12(3), 369-378.CrossRefGoogle Scholar
  12. Newman, R. S., Clouse, S. A., & Burnham, J. L. (2001). The perceptual consequences of within-talker variability in fricative production. Journal of the Acoustical Society of America, 109, 1181-1196.CrossRefGoogle Scholar
  13. Norris, D., McQueen, J. M., & Cutler, A. (2003). Perceptual learning in speech. Cognitive Psychology, 47(2), 204-238.Google Scholar
  14. Nygaard, L. C., & Pisoni, D. B. (1998). Talker-specific learning in speech perception. Perception & Psychophysics, 60(3), 355-376.Google Scholar
  15. Saltzman, D., & Myers, E. (2018). Listeners are maximally flexible in updating phonetic beliefs over time. Psychonomic Bulletin & Review, 25(2), 718-724.CrossRefGoogle Scholar
  16. Theodore, R. M., & Miller, J. L. (2010). Characteristics of listener sensitivity to talker-specific phonetic detail. Journal of the Acoustical Society of America, 128, 2090-2099.CrossRefGoogle Scholar
  17. Theodore, R. M., Miller, J. L., & DeSteno, D. (2009). Individual talker differences in voice-onset-time: Contextual influences. Journal of the Acoustical Society of America, 125, 3974-3982.CrossRefGoogle Scholar
  18. Theodore, R. M., Myers, E. B., & Lomibao, J. A. (2015). Talker-specific influences on phonetic category structure. Journal of the Acoustical Society of America, 138, 1068-1078.CrossRefGoogle Scholar

Copyright information

© Psychonomic Society, Inc. 2018

Authors and Affiliations

  1. 1.Department of Speech, Language, and Hearing SciencesUniversity of ConnecticutStorrsUSA
  2. 2.Connecticut Institute for the Brain and Cognitive SciencesUniversity of ConnecticutStorrsUSA

Personalised recommendations