Effects of talker continuity and speech rate on auditory working memory

  • Sung-Joo LimEmail author
  • Barbara G. Shinn-Cunningham
  • Tyler K. Perrachione
Perceptual/Cognitive Constraints on the Structure of Speech Communication: In Honor of Randy Diehl


Speech processing is slower and less accurate when listeners encounter speech from multiple talkers compared to one continuous talker. However, interference from multiple talkers has been investigated only using immediate speech recognition or long-term memory recognition tasks. These tasks reveal opposite effects of speech processing time on speech recognition – while fast processing of multi-talker speech impedes immediate recognition, it also results in more abstract and less talker-specific long-term memories for speech. Here, we investigated whether and how processing multi-talker speech disrupts working memory maintenance, an intermediate stage between perceptual recognition and long-term memory. In a digit sequence recall task, listeners encoded seven-digit sequences and recalled them after a 5-s delay. Sequences were spoken by either a single talker or multiple talkers at one of three presentation rates (0-, 200-, and 500-ms inter-digit intervals). Listeners’ recall was slower and less accurate for sequences spoken by multiple talkers than a single talker. Especially for the fastest presentation rate, listeners were less efficient when recalling sequences spoken by multiple talkers. Our results reveal that talker-specificity effects for speech working memory are most prominent when listeners must rapidly encode speech. These results suggest that, like immediate speech recognition, working memory for speech is susceptible to interference from variability across talkers. While many studies ascribe effects of talker variability to the need to calibrate perception to talker-specific acoustics, these results are also consistent with the idea that a sudden change of talkers disrupts attentional focus, interfering with efficient working-memory processing.


Talker adaptation Speech perception Auditory working memory Recall efficiency Aditory streaming 



This work was supported by NIH grant R03DC014045 and a Brain and Behavioral Research Foundation NARSAD Young Investigator grant to TKP and NIH grant R01DC009477 to BGSC. SJL was supported by NIH training grant T32DC013017. We thank Yaminah Carter for her assistance.


  1. Antoniou, M., & Wong, P. C. M. (2015). Poor phonetic perceivers are affected by cognitive load when resolving talker variability. The Journal of the Acoustical Society of America, 138(2), 571–574. Google Scholar
  2. Baddeley, A. (1992). Working memory. Science, 255(5044), 556–559.Google Scholar
  3. Baddeley, A. (2003). Working memory: looking back and looking forward. Nature Reviews Neuroscience, 4(10), 829–839. Google Scholar
  4. Best, V., Ozmeral, E. J., Kopčo, N., & Shinn-Cunningham, B. G. (2008). Object continuity enhances selective auditory attention. Proceedings of the National Academy of Sciences, 105(35), 13174–13178. Google Scholar
  5. Bizley, J. K., & Cohen, Y. E. (2013). The what, where and how of auditory-object perception. Nature Reviews Neuroscience, 14(10), 693–707. Google Scholar
  6. Bradlow, A. R., Nygaard, L. C., & Pisoni, D. B. (1999). Effects of talker, rate, and amplitude variation on recognition memory for spoken words. Perception & Psychophysics, 61(2), 206–219. Google Scholar
  7. Bregman, A. S. (1990). Auditory scene analysis. Cambridge, MA: MIT Press.Google Scholar
  8. Bregman, A. S., & Campbell, J. C. (1971). Primary auditory stream segregation and perception of order in rapid sequences of tones. Journal of Experimental Psychology, 89(2), 244–249.Google Scholar
  9. Bressler, S., Masud, S., Bharadwaj, H., & Shinn-Cunningham, B. (2014). Bottom-up influences of voice continuity in focusing selective auditory attention. Psychological Research, 78(3), 349–360. Google Scholar
  10. Bruyer, R., & Brysbaert, M. (2011). Combining speed and accuracy in cognitive psychology: Is the inverse efficiency score (IES) a better dependent variable than the mean reaction time (RT) and the percentage of errors (PE)? Psychologica Belgica, 5(1), 5–13.Google Scholar
  11. Chambers, J. M., & Hastie, T. J. (1992). Statistical models in S. Pacific Grove, CA: Wadsworth.Google Scholar
  12. Chandrasekaran, B., Chan, A., & Wong, P. C. M. (2011). Neural processing of what and who information in speech. Journal of Cognitive Neuroscience, 23(10), 2690–2700.Google Scholar
  13. Choi, J. Y., Hu, E. R., & Perrachione, T. K. (2018). Varying acoustic-phonemic ambiguity reveals that talker normalization is obligatory in speech processing. Attention, Perception, & Psychophysics, 80, 784–797.Google Scholar
  14. Conway A. R. A., Cowan N., Bunting M. F., Therriault D. J., Minkoff S. R. B. (2002). A latent variable analysis of working memory capacity, short-term memory capacity, processing speed, and general fluid intelligence. Intelligence, 30, 163–184.Google Scholar
  15. Cowan, N. (2008). What are the differences between long-term, short-term, and working memory? Progress in Brain Research, 169, 323–338.Google Scholar
  16. Craik, F. I. M., & Kirsner, K. (1974). The effect of speaker's voice on word recognition. The Quarterly Journal of Experimental Psychology, 26(2), 274–284. Google Scholar
  17. Darwin, C. J., & Carlyon, R. P. (1995). Auditory grouping. In B. C. Moore (Ed.), Hearing handbook of perception and cognition (pp. 387–424). Elsevier.
  18. Darwin, C. J., & Hukin, R. W. (2000). Effectiveness of spatial cues, prosody, and talker characteristics in selective attention. The Journal of the Acoustical Society of America, 107(2), 970–977. Google Scholar
  19. Engle, R. W., Tuholski, S. W., Laughlin, J. E., & Conway, A. R. (1999). Working memory, short-term memory, and general fluid intelligence: a latent-variable approach. Journal of Experimental Psychology: General, 128(3), 309-331.Google Scholar
  20. Evans B. G., & Iverson, P. (2004). Vowel normalization for accent: An investigation of best exemplar locations in northern and southern British English sentences. The Journal of the Acoustical Society of America, 115 (1), 352-361.Google Scholar
  21. Geiselman, R. E., & Bellezza, F. S. (1977). Incidental retention of speaker’s voice. Memory & Cognition, 5(6), 658–665. Google Scholar
  22. Goldinger, S. D., Pisoni, D. B., & Logan, J. S. (1991). On the nature of talker variability effects on recall of spoken word lists. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15(1), 152–162.Google Scholar
  23. Green, K. P., Tomiak, G. R., & Kuhl, P. K. (1997). The encoding of rate and talker information during phonetic perception. Perception and Psychophysics, 59 (5), 675-692.Google Scholar
  24. Griffiths, T. D., & Warren, J. D. (2004). What is an auditory object? Nature Reviews Neuroscience, 5, 887–892.Google Scholar
  25. Heald, S. L. M., & Nusbaum, H. C. (2014). Speech perception as an active cognitive process. Frontiers in Systems Neuroscience, 1–15.
  26. Heald, S. L.M., Van Hedger, S. C., & Nusbaum, H. C. (2017). Perceptual plasticity for auditory object recognition. Frontiers in Psychology, 8: 781. Google Scholar
  27. Hickok, G. (2009). The functional neuroanatomy of language. Physics of Life Reviews, 6(3), 121–143. Google Scholar
  28. Hillenbrand, J., Getty, L. A., Clark, M. J., & Wheeler, K. (1995). Acoustic characteristics of American English vowels. The Journal of the Acoustical Society of America, 97(5), 3099–3111. Google Scholar
  29. Huang, J., and Holt, L. L. (2012). Listening for the norm: adaptive coding in speech categorization. Frontiers in Psychology, 3: 10.Google Scholar
  30. Jacquemot, C., & Scott, S. K. (2006). What is the relationship between phonological short-term memory and speech processing? Trends in Cognitive Sciences, 10(11), 480–486. Google Scholar
  31. Johnson, K., Strand, E. A., and D’Imperio, M. (1999). Auditory–visual integration of talker gender in vowel perception. Journal of Phonetics, 27(4), 359-384.Google Scholar
  32. Joseph, S., Kumar, S., Husain, M., & Griffiths, T. D. (2015). Auditory working memory for objects vs. features. Frontiers in Neuroscience, 9, 20738.
  33. Kane, M. J., Bleckley, M. K., Conway, A. R., Engle, R. W. (2001). A controlled-attention view of working-memory capacity. Journal of Experimental Psychology: General, 130(2), 169-183.Google Scholar
  34. Kleinschmidt, D. F., & Jaeger, T. F. (2015). Robust speech perception: Recognize the familiar, generalize to the similar, and adapt to the novel. Psychological Review, 122(2), 148.Google Scholar
  35. Ladefoged & Broadbent (1957). Information conveyed by vowels. Journal of the Acoustical Society of America, 29, 98–104.Google Scholar
  36. Lakatos, P., Musacchia, G., O’Connel, M. N., Falchier, A. Y., Javitt, D. C., & Schroeder, C. E. (2013). The spectrotemporal filter mechanism of auditory selective attention. Neuron, 77(4), 750–761.Google Scholar
  37. Liberman, A. M., Delattre, P. C., Gerstman, L. J., and Cooper, F. S. (1956). Tempo of frequency change as a cue for distinguishing classes of speech sounds. Journal of Experimental Psychology, 52(2):127-37.Google Scholar
  38. Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert-Kennedy, M. (1967). Perception of the speech code. Psychological Review, 74(6), 431–461. Google Scholar
  39. Lim, S.-J., Wöstmann, M., & Obleser, J. (2015). Selective Attention to Auditory Memory Neurally Enhances Perceptual Precision. The Journal of Neuroscience, 35(49), 16094–16104. Google Scholar
  40. Lim, S.-J., Wöstmann, M., Geweke, F., & Obleser, J. (2018). The benefit of attention-to-memory depends on the interplay of memory capacity and memory load. Frontiers in Psychology, 9, 146. Google Scholar
  41. Luce, P. A., & McLennan, C. T. (2005). Spoken word recognition: The challenge of variation. In D. B. Pisoni & R. E. Remez (Eds.), Handbook of speech perception (pp. 591–609). Maldon, MA: Blackwell.Google Scholar
  42. Macken, W. J., Tremblay, S., Houghton, R., Nicholls, A. P., & Jones, D. M. (2003). Does auditory streaming require attention? Evidence from attentional selectivity in short-term memory. Journal of Experimental Psychology. Human Perception and Performance, 29(1), 43–51.Google Scholar
  43. Maddox, R. K., & Shinn-Cunningham, B. G. (2012). Influence of task-relevant and task-irrelevant feature continuity on selective auditory attention. Journal of the Association for Research in Otolaryngology, 13(1), 119–129. Google Scholar
  44. Magnuson, J. S., & Nusbaum, H. C. (2007). Acoustic differences, listener expectations, and the perceptual accommodation of talker variability. Journal of Experimental Psychology. Human Perception and Performance, 33(2), 391–409. Google Scholar
  45. Mann, V. A. (1986). Distinguishing universal and language-dependent levels of speech perception: evidence from Japanese listeners’ perception of English “l” and “r”. Cognition, 24(3), 169-196.Google Scholar
  46. Martin, C. S., Mullennix, J. W., Pisoni, D. B., & Summers, W. V. (1989). Effects of talker variability on recall of spoken word lists. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15(4), 676–684.Google Scholar
  47. Mathias, S. R., & Kriegstein, von, K. (2014). Percepts, not acoustic properties, are the units of auditory short-term memory. Journal of Experimental Psychology: Human Perception and Performance, 40(2), 445–450.Google Scholar
  48. Mattys, S. L., & Liss, J. M. (2008). On building models of spoken-word recognition: When there is as much to learn from natural “oddities” as artificial normality. Perception & Psychophysics, 70(7), 1235–1242. Google Scholar
  49. McLennan, C. T., & González, J. (2012). Examining talker effects in the perception of native- and foreign-accented speech. Attention, Perception, & Psychophysics, 74(5), 824–830. Google Scholar
  50. McLennan, C. T., & Luce, P. A. (2005). Examining the time course of indexical specificity effects in spoken word recognition. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31(2), 306–321.Google Scholar
  51. Moulines, E., & Charpentier, F. (1990). Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Communication, 9(5-6), 453–467. Google Scholar
  52. Mullennix, J. W., & Pisoni, D. B. (1990). Stimulus variability and processing dependencies in speech perception. Perception & Psychophysics, 47(4), 379–390.Google Scholar
  53. Mullennix, J. W., Pisoni, D. B., & Martin, C. S. (1989). Some effects of talker variability on spoken word recognition. The Journal of the Acoustical Society of America, 85(1), 365–378. Google Scholar
  54. Nearey, T. M. (1998). Static, dynamic, and relational properties in vowel perception. The Journal of the Acoustical Society of America, 85(5), 2088–2113. Google Scholar
  55. Niedzielski, N. (1999). The effect of social information on the perception of sociolinguistic variables. Journal of Langauge and Social Psychology, 18 (1), 62–85.Google Scholar
  56. Nusbaum, H. C., & Magnuson, J. (1997). Talker normalization: Phonetic constancy as a cognitive process. In K. A. Johnson & J. W. Mullennix (Eds.), Talker variability and speech processing (pp. 109–132). New York, NY: Academic PressGoogle Scholar
  57. Nusbaum, H. C., & Morin, T. M. (1992). Paying attention to differences among talkers. In Y. Tohkura, Y. Sagisaka, & E. Vatikiotis-Bateson (Eds.), Speech Perception, Production and Linguistic Structure (pp. 113–134). Tokyo.Google Scholar
  58. Nygaard, L. C., & Pisoni, D. B. (1998). Talker-specific learning in speech perception. Perception & Psychophysics, 60(3), 355–376.Google Scholar
  59. Nygaard, L. C., Sommers, M. S., & Pisoni, D. B. (1994). Speech perception as a talker-contingent process. Psychological Science, 5(1), 42–46. Google Scholar
  60. Nygaard, L. C., Sommers, M. S., & Pisoni, D. B. (1995). Effects of stimulus variability on perception and representation of spoken words in memory. Perception & Psychophysics, 57(7), 989–1001. Google Scholar
  61. Palmeri, T. J., Goldinger, S. D., & Pisoni, D. B. (1993). Episodic encoding of voice attributes and recognition memory for spoken words. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19(2), 309–328. Google Scholar
  62. Perrachione, T. K., Del Tufo, S. N., Winter, R., Murtagh, J., Cyr, A., Chang, P., et al. (2016). Dysfunction of rapid neural adaptation in dyslexia. Neuron, 92(6), 1383–1397. Google Scholar
  63. Perrachione, T. K., Ghosh, S. S., Ostrovskaya, I., Gabrieli, J. D. E., & Kovelman, I. (2017). Phonological working memory for words and nonwords in cerebral cortex. Journal of Speech, Language, and Hearing Research, 60(7), 1959–1979. Google Scholar
  64. Perrachione, T. K., Lee, J., Ha, L. Y. Y., & Wong, P. C. M. (2011). Learning a novel phonological contrast depends on interactions between individual differences and training paradigm design. The Journal of the Acoustical Society of America, 130(1), 461–472. Google Scholar
  65. Peterson, G. E., & Barney, H. L. (1952). Control methods used in a study of the vowels. The Journal of the Acoustical Society of America, 24(2), 175–184. Google Scholar
  66. Shinn-Cunningham, B. G. (2008). Object-based auditory and visual attention. Trends in Cognitive Sciences, 12(5), 182–186. Google Scholar
  67. Souza, P., Gehani, N., Wright, R., & McCloy, D. (2013). The advantage of knowing the talker. Journal of the American Academy of Audiology, 24(8), 689–700. Google Scholar
  68. Sussman, E. S., Horváth, J., Winkler, I., & Orr, M. (2007). The role of attention in the formation of auditory streams. Perception & Psychophysics, 69(1), 136–152. Google Scholar
  69. Theodore, R. M., Blumstein, S. E., & Luthra, S. (2015). Attention modulates specificity effects in spoken word recognition: Challenges to the time-course hypothesis. Attention, Perception, & Psychophysics, 77(5), 1674–1684. Google Scholar
  70. Townsend, J. T., & Ashby, F. G. (1978). Methods of modeling capacity in simple processing systems. In J. Castellan & F. Restle (Eds.), Cognitive theory (Vol. 3, pp. 200–239). Hillsdale, NJ: Erlbaum.Google Scholar
  71. Townsend, J. T., & Ashby, F. G. (1983). Stochastic modeling of elementary psychological processes. Cambridge: Cambridge University Press.Google Scholar
  72. van Noorden, L. P. A. S. (1975). Temporal coherence in the perception of tone sequences (Vol. 3, pp. 1–129). Eindhoven, The Netherlands: Institute for Perceptual Research. Google Scholar
  73. Vliegen, J., Moore, B. C. J., & Oxenham, A. J. (1999). The role of spectral and periodicity cues in auditory stream segregation, measured using a temporal discrimination task. The Journal of the Acoustical Society of America, 106(2), 938–945. Google Scholar
  74. Winkler, I., Denham, S. L., & Nelken, I. (2009). Modeling the auditory scene: Predictive regularity representations and perceptual objects. Trends in Cognitive Sciences, 13(12), 532-40.Google Scholar
  75. Wong, P. C. M., Nusbaum, H. C., & Small, S. L. (2004). Neural bases of talker normalization. Journal of Cognitive Neuroscience, 16(7), 1–13.Google Scholar
  76. Woods, K. J. P., & McDermott, J. H. (2015). Attentive tracking of sound sources. Current Biology, 25(17), 2238–2246.Google Scholar
  77. Wöstmann, M., Lim, S.-J., & Obleser, J. (2017). The human neural alpha response to speech is a proxy of attentional control. Cerebral Cortex, 27(6), 3307–3317. Google Scholar

Copyright information

© The Psychonomic Society, Inc. 2019

Authors and Affiliations

  • Sung-Joo Lim
    • 1
    • 2
    Email author
  • Barbara G. Shinn-Cunningham
    • 2
  • Tyler K. Perrachione
    • 1
  1. 1.Department of Speech, Language, and Hearing SciencesBoston UniversityBostonUSA
  2. 2.Biomedical EngineeringBoston UniversityBostonUSA

Personalised recommendations