Talker-specific learning in speech perception

Abstract

The effects of perceptual learning of talker identity on the recognition of spoken words and sentences were investigated in three experiments. In each experiment, listeners were trained to learn a set of 10 talkers’ voices and were then given an intelligibility test to assess the influence of learning the voices on the processing of the linguistic content of speech. In the first experiment, listeners learned voices from isolated words and were then tested with novel isolated words mixed in noise. The results showed that listeners who were given words produced by familiar talkers at test showed better identification performance than did listeners who were given words produced by unfamiliar talkers. In the second experiment, listeners learned novel voices from sentence-length utterances and were then presented with isolated words. The results showed that learning a talker’s voice from sentences did not generalize well to identification of novel isolated words. In the third experiment, listeners learned voices from sentence-length utterances and were then given sentence-length utterances produced by familiar and unfamiliar talkers at test. We found that perceptual learning of novel voices from sentence-length utterances improved speech intelligibility for words in sentences. Generalization and transfer from voice learning to linguistic processing was found to be sensitive to the talker-specific information available during learning and test. These findings demonstrate that increased sensitivity to talker-specific information affects the perception of the linguistic properties of speech in isolated words and sentences.

References

  1. Abercrombie, D. (1967).Elements of general phonetics. Chicago: Aldine.

    Google Scholar 

  2. Assmann, P. F., Nearey, T. M., &Hogan, J. T. (1982). Vowel identification: Orthographic, perceptual, and acoustic aspects.Journal of the Acoustical Society of America,71, 975–989.

    Article  PubMed  Google Scholar 

  3. Bradlow, A. R., Nygaard, L. C., &Pisoni, D. B. (1995). On the contribution of in-tance-specific characteristics to speech perception. In C. Sorin, J. Mariani, H. Meloni, & J. Schoentagen (Eds.),Levels in speech communication: Relations and interactions (pp. 13–24). Amsterdam: Elsevier.

    Google Scholar 

  4. Bradlow, A. R., Torretta, G. M., &Pisoni, D. B. (1996). Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics.Speech Communication,20, 255–272.

    Article  Google Scholar 

  5. Bricker, P. D., &Pruzansky, S. (1976). Speaker recognition. In N. J. Lass (Ed.),Contemporary issues in experimental phonetics (pp. 295–326). New York: Academic Press.

    Google Scholar 

  6. Brooks, L. (1978). Nonanalytic concept formation and memory for instances. In E. Rosch & B. Lloyd (Eds.),Cognition and categorization (pp. 169–211). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  7. Church, B. A., &Schacter, D. L. (1994). Perceptual specificity of auditory priming: Implicit memory for voice intonation and fundamental frequency.Journal of Experimental Psychology: Learning, Memory, & Cognition,20, 521–533.

    Article  Google Scholar 

  8. Cole, R. A., Coltheart, M., &Allard, F. (1974). Memory of a speaker’s voice: Reaction time to same- or different-voiced letters.Quarterly Journal of Experimental Psychology,26, 1–7.

    PubMed  Google Scholar 

  9. Costanzo, F. S., Markel, N. N., &Costanzo, P. R. (1989). Voice quality profile and perceived emotion.Journal of Counseling Psychology,16, 267–270.

    Article  Google Scholar 

  10. Craik, F. I. M., &Kirsner, K. (1974). The effect of speaker’s voice on word recognition.Quarterly Journal of Experimental Psychology,26, 274–284.

    Article  Google Scholar 

  11. Creelman, C. D. (1957). The case of the unknown talker.Journal of the Acoustical Society of America,29, 655.

    Article  Google Scholar 

  12. Doddington, G. R. (1985). Speaker recognition: Identifying people by their voices.Proceedings of the IEEE,73, 1651–1664.

    Article  Google Scholar 

  13. Dupoux, E., &Green, K. (1997). Perceptual adjustment to highly compressed speech: Effects of talker and rate changes.Journal of Experimental Psychology: Human Perception & Performance,23, 914–927.

    Article  Google Scholar 

  14. Egan, J. P. (1948). Articulation testing methods.Laryngoscope,58, 955–991.

    Article  PubMed  Google Scholar 

  15. Fant, G. (1973).Speech sounds and features. Cambridge, MA: MIT Press.

    Google Scholar 

  16. Fodor, J. A. (1983).The modularity of mind. Cambridge, MA: MIT Press.

    Google Scholar 

  17. Fowler, C. A. (1986). An event approach to the study of speech perception from a direct-realist perspective.Journal of Phonetics,14, 3–28.

    Google Scholar 

  18. Garfield, J. L. (1987). Introduction: Carving the mind at its joints. In J. L. Garfield (Ed.),Modularity in knowledge representation and natural-language understanding (pp. 17–23). Cambridge, MA: MIT Press.

    Google Scholar 

  19. Garner, W. (1974).The processing of information and structure. Hillsdale, NJ: Erlbaum.

    Google Scholar 

  20. Garvin, P. L., &Ladefoged, P. L. (1963). Speaker identification and message identification in speech recognition.Phonetica,9, 193–199.

    Article  Google Scholar 

  21. Geiselman, R. E. (1979). Inhibition of the automatic storage of speaker’s voice.Memory & Cognition,7, 201–204.

    Google Scholar 

  22. Geiselman, R. E., &Bellezza, F. S. (1976). Long-term memory for speaker’s voice and source location.Memory & Cognition,4, 483–489.

    Google Scholar 

  23. Geiselman, R. E., &Bellezza, F. S. (1977). Incidental retention of speaker’s voice.Memory & Cognition,5, 658–665.

    Google Scholar 

  24. Geiselman, R. E., &Crawley, J. M. (1983). Incidental processing of speaker characteristics: Voice as connotative information.Journal of Verbal Learning & Verbal Behavior,22, 15–23.

    Article  Google Scholar 

  25. Gibson, E. J. (1969).Principles of perceptual learning and development. New York: Appleton-Century-Crofts.

    Google Scholar 

  26. Gibson, E. J. (1991).An Odyssey in learning and perception. Cambridge, MA: MIT Press.

    Google Scholar 

  27. Gibson, J. J., &Gibson, E. J. (1955). Perceptual learning: Differentiation or enrichment?Psychological Review,62, 32–41.

    Article  PubMed  Google Scholar 

  28. Goldinger, S. D. (1992).Words and voices: Implicit and explicit memory for spoken words (Research on Speech Perception Tech. Rep. No. 7). Bloomington: Indiana University, Department of Psychology.

    Google Scholar 

  29. Goldinger, S. D. (1996). Words and voices: Episodic traces in spoken word identification and recognition memory.Journal of Experimental Psychology: Learning, Memory, & Cognition,22, 1166–1183.

    Article  Google Scholar 

  30. Goldinger, S. D. Pisoni, D. B., &Logan, D. B. (1991). The nature of talker variability effects on recall of spoken word lists.Journal of Experimental Psychology: Learning, Memory, & Cognition,17, 152–162.

    Article  Google Scholar 

  31. Goldstone, R. (1994). Influences of categorization on perceptual discrimination.Journal of Experimental Psychology: General,123, 178–200.

    Article  Google Scholar 

  32. Greenspan, S. L., Nusbaum, H. C., &Pisoni, D. B. (1988). Perceptual learning of synthetic speech produced by rule.Journal of Experimental Psychology: Learning, Memory, & Cognition,14, 421–433.

    Article  Google Scholar 

  33. Hall, G. (1991).Perceptual and associative learning. Oxford: Oxford University Press, Clarendon Press.

    Google Scholar 

  34. Halle, M. (1985). Speculations about the representation of words in memory. In V. A. Fromkin (Ed.),Phonetic linguistics (pp. 101–114). New York: Academic Press.

    Google Scholar 

  35. Hintzman, D. L. (1986). “Schema abstraction” in a multiple trace memory model.Psychological Review,93, 411–428.

    Article  Google Scholar 

  36. House, A. S., Williams, C. E., Hecker, M. H. L., &Kryter, K. D. (1965). Articulation-testing methods: Consonantal differentiation with a closed-response set.Journal of the Acoustical Society of America,37, 158–166.

    Article  PubMed  Google Scholar 

  37. Institute of Electrical and Electronics Engineers (1969).IEEE recommended practice for speech quality measurements (IEEE Report No. 297). New York: Author.

    Google Scholar 

  38. Jacoby, L. L., &Brooks, L. R. (1984). Nonanalytic cognition: Memory, perception, and concept learning. In G. H. Bower (Ed.),The psychology of learning and motivation (Vol. 18, pp. 1–47). New York: Academic Press.

    Google Scholar 

  39. Johnson, K. (1990). The role of perceived speaker identity inF0 normalization of vowels.Journal of the Acoustical Society of America,88, 642–654.

    Article  PubMed  Google Scholar 

  40. Joos, M. A. (1948). Acoustic phonetics.Language,24(Suppl. 2), 1–136.

    Google Scholar 

  41. Kolers, P. A. (1976). Pattern analyzing memory.Science,191, 1280–1281.

    Article  PubMed  Google Scholar 

  42. Kolers, P. A., &Ostry, D. J. (1974). Time course of loss of information regarding pattern analyzing operations.Journal of Verbal Learning & Verbal Behavior,13, 599–612.

    Article  Google Scholar 

  43. Kuhl, P. K. (1991). Human adults and human infants show a “perceptual magnet effect” for the prototypes of speech categories, monkeys do not.Perception & Psychophysics,50, 93–107.

    Google Scholar 

  44. Kuhl, P. K. (1992). Psychoacoustics and speech perception: Internal standards, perceptual anchors, and prototypes. In L. A. Werner & E. W. Rubel (Eds.),Developmental psychoacoustics (pp. 293–332). Washington, DC: APA Press.

    Google Scholar 

  45. Labov, W. (1972).Sociolinguisticpatterns. Philadelphia: University of Pennsylvania Press.

    Google Scholar 

  46. Ladefoged, P. (1980). What are linguistic sounds made of?Language,56, 485–502.

    Article  Google Scholar 

  47. Ladefoged, P., &Broadbent, D. E. (1957). Information conveyed by vowels.Journal of the Acoustical Society of America,29, 98–104.

    Article  Google Scholar 

  48. Laver, J. (1989). Cognitive science and speech: A framework for research. In H. Schnelle & N. O. Bernsen (Eds.),Logic and linguistics: Research directions in cognitive science. European perspectives (Vol. 2, pp. 37–70). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  49. Laver, J., &Trudgill, P. (1979). Phonetic and linguistic markers in speech. In K. R. Scherer & H. Giles (Eds.),Social markers in speech (pp. 1–32). Cambridge: Cambridge University Press.

    Google Scholar 

  50. Lawrence, D. H. (1949). Acquired distinctiveness of cues: I. Transfer between discriminations on the basis of familiarity with the stimulus.Journal of Experimental Psychology,39, 770–784.

    Article  PubMed  Google Scholar 

  51. Legge, G. E., Grossmann, C., &Pieper, C. M. (1984). Learning unfamiliar voices.Journal of Experimental Psychology: Learning, Memory, & Cognition,10, 1–36.

    Article  Google Scholar 

  52. Liberman, A. M., &Mattingly, I. G. (1985). The motor theory of speech perception revised.Cognition,21, 1–36.

    Article  PubMed  Google Scholar 

  53. Lightfoot, N. (1989).Effects of talker familiarity on serial recall of spoken word lists (Research on Speech Perception Progress Report No. 15). Bloomington: Indiana University, Department of Psychology.

    Google Scholar 

  54. Lively, S. E., Logan, J. S., &Pisoni, D. B. (1993). Training Japanese listeners to identify English Irl and III: II. The role of phonetic environment and talker variability in learning new perceptual categories.Journal of the Acoustical Society of America,94, 1242–1255.

    Article  PubMed  Google Scholar 

  55. Lively, S. E., Pisoni, D. B., Yamada, R. A., Tohkura, Y., &Yamada, T. (1994). Training Japanese listeners to identify English Irl and IM: III. Long-term retention of new phonetic categories.Journal of the Acoustical Society of America,96, 2076–2087.

    Article  PubMed  Google Scholar 

  56. Logan, J. S., Lively, S. E., &Pisoni, D. B. (1991). Training Japanese listeners to identify English Irl and HI: A first report.Journal of the Acoustical Society of America,89, 874–886.

    Article  PubMed  Google Scholar 

  57. Luce, P. A., Pisoni, D. B., &Goldinger, S. D. (1990). Similarity neighborhoods of spoken words. In G. T. M. Altmann (Ed.),Cognitive models of speech processing: Psycholinguistic and computational perspectives (pp. 122–147). Cambridge, MA: MIT Press.

    Google Scholar 

  58. Markel, N. N., Bein, M. F., &Phillis, J. (1973). The relationship between words and tone-of-voice.Language & Speech,16, 15–21.

    Google Scholar 

  59. Martin, C. S., Mullennix, J. W., Pisoni, D. B., &Summers, W. V. (1989). Effects of talker variability on recall of spoken word lists.Journal of Experimental Psychology: Learning, Memory, & Cognition,15, 676–681.

    Article  Google Scholar 

  60. McClelland, J. L., &Elman, J. L. (1986). The TRACE model of speech perception.Cognitive Psychology,18, 1–86.

    Article  PubMed  Google Scholar 

  61. Miller, J. D. (1989). Auditory-perceptual interpretation of the vowel.Journal of the Acoustical Society of America,85, 2114–2134.

    Article  PubMed  Google Scholar 

  62. Mullennix, J. W., &Pisoni, D. B. (1990). Stimulus variability and processing dependencies in speech perception.Perception & Psychophysics,47, 379–390.

    Google Scholar 

  63. Mullennix, J. W., Pisoni, D. B., &Martin, C. S. (1989). Some effects of talker variability on spoken word recognition.Journal ofthe Acoustical Society of America,85, 365–378.

    Article  Google Scholar 

  64. Murray, I. R., &Arnott, J. L. (1993). Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion.Journal of the Acoustical Society of America,93, 1097–1108.

    Article  PubMed  Google Scholar 

  65. Nearey, T. M. (1989). Static, dynamic, and relational properties in vowel perception.Journal of the Acoustical Society of America,85, 2088–2113.

    Article  PubMed  Google Scholar 

  66. Nosofsky, R. M. (1987). Attention and learning processes in the identification and categorization of integral stimuli.Journal of Experimental Psychology: Learning, Memory, & Cognition,15, 700–708.

    Google Scholar 

  67. Nusbaum, H. C., Pisoni, D. B., &Davis, D. K. (1984).Sizing up the Hoosier mental lexicon: Measuring the familiarity of 20,000 words (Research on Speech Perception Progress Report No. 10). Bloomington: Indiana University, Department of Psychology.

    Google Scholar 

  68. Nygaard, L. C., &Kalish, M. L. (1994). Modeling the effect of learning voices on the perception of speech.Journal of the Acoustical Society of America,95, 2873.

    Article  Google Scholar 

  69. Nygaard, L. C., &Pisoni, D. B. (1995). Speech perception: New directions in research and theory. In J. L. Miller & P. D. Eimas (Eds.),Handbook of perception and cognition: Vol. II. Speech, language and communication (pp. 63–96). New York: Academic Press.

    Google Scholar 

  70. Nygaard, L. C., Sommers, M. S., &Pisoni, D. B. (1994). Speech perception as a talker-contingent process.Psychological Science,5, 42–46.

    Article  Google Scholar 

  71. Nygaard, L. C., Sommers, M. S., &Pisoni, D. B. (1995). Effects of stimulus variability on perception and representation of spoken words in memory.Perception & Psychophysics,57, 989–1001.

    Google Scholar 

  72. Palmeri, T. J. Goldinger, S. D., &Pisoni, D. B. (1993). Episodic encoding of voice attributes and recognition memory for spoken words.Journal of Experimental Psychology: Learning, Memory, & Cognition,19, 309–328.

    Article  Google Scholar 

  73. Peters, R. W. (1955a). The effect of length of exposure to speaker’s voice upon listener reception. InJoint Project Report No. 44 (pp. 1–8). Pensacola, FL: U.S. Naval School of Aviation Medicine.

    Google Scholar 

  74. Peters, R. W. (1955b). The relative intelligibility of single-voice and multiple-voice messages under various conditions of noise. InJoint Project Report No. 56 (pp. 1–9). Pensacola, FL: U.S. Naval School of Aviation Medicine.

    Google Scholar 

  75. Peterson, G. E., &Barney, H. L. (1952). Control methods used in a study of the vowels.Journal of the Acoustical Society of America,24, 175–184.

    Article  Google Scholar 

  76. Pisoni, D. B. (1993). Long-term memory in speech perception: Some new findings on talker variability, speaking rate, and perceptual learning.Speech Communication,13, 109–125.

    Article  Google Scholar 

  77. Pisoni, D. B. (1997). Some thoughts on “normalization” in speech perception. In K. Johnson & J. W. Mullennix (Eds.),Talker variability in speech processing (pp. 9–32). San Diego: Academic Press.

    Google Scholar 

  78. Pollack, I., Pickett, J. M., &Sumby, W. H. (1954). On the identification of speakers by voice.Journal of the Acoustical Society of America,26, 403–406.

    Article  Google Scholar 

  79. Remez, R. E., Fellowes, J. M., &Rubin, P. E. (1997). Talker identification based on phonetic information.Journal of Experimental Psychology: Human Perception & Performance,23, 651–666.

    Article  Google Scholar 

  80. Schacter, D. L. (1990). Perceptual representation systems and implicit memory: Toward a resolution of the multiple memory systems debate. In A. Diamond (Ed.),Development and neural bases of higher cortical functions (Annals of the New York Academy of Sciences, Vol. 608, pp. 543–571). New York: New York Academy of Sciences.

    Google Scholar 

  81. Schwab, E. C., Nusbaum, H. C., &Pisoni, D. B. (1985). Some effects of training on the perception of synthetic speech.Human Factors,27, 395–408.

    PubMed  Google Scholar 

  82. Shankweiler, D. P., Strange, W., &Verbrugge, R. R. (1977). Speech and the problem of perceptual constancy. In R. Shaw & J. Bransford (Eds.),Perceiving, acting, and knowing: Toward an ecological psychology (pp. 315–345). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  83. Shepard, R. N., &Teghtsoonian, M. (1961). Retention of information under conditions approaching a steady state.Journal of Experimental Psychology,62, 302–309.

    Article  PubMed  Google Scholar 

  84. Sommers, M. S., Nygaard, L. C., &Pisoni, D. B. (1994). Stimulus variability and spoken word recognition: I. Effects of variability in speaking rate and overall amplitude.Journal of the Acoustical Society of America,96, 1314–1324.

    Article  PubMed  Google Scholar 

  85. Stevens, K. N., &Blumstein, S. E. (1978). Invariant cues for place of articulation in stop consonants.Journal of the Acoustical Society of America,64, 1358–1368.

    Article  PubMed  Google Scholar 

  86. Strange, W., &Dittmann, S. (1984). Effects of discrimination training on the perception of /r-1/ by Japanese adults learning English.Perception & Psychophysics,36, 131–145.

    Google Scholar 

  87. Summerfield, Q. (1975). Acoustic and phonetic components of the influence of voice changes and identification times for CVC syllables. InReport on research in progress in speech perception (Vol. 2, pp. 73–98). Belfast, Northern Ireland: The Queen’s University of Belfast, Department of Psychology.

    Google Scholar 

  88. Summerfield, Q., &Haggard, M. P. (1973). Vocal tract normalization as demonstrated by reaction times. InReport of speech research in progress (Vol. 2, pp. 12–23). Belfast, Northern Ireland: The Queen’s University of Belfast.

    Google Scholar 

  89. Thompson, C. P. (1985). Voice identification: Speaker identifiability and a correction of the record regarding sex effects.Human Learning: Journal of Practical Research & Applications,4, 19–27.

    Google Scholar 

  90. Van Lancker, D. (1991). Personal relevance and the human right hemisphere.Brain & Cognition,17, 64–92.

    Article  Google Scholar 

  91. Van Lancker, D., Cummings, J. L., Kreiman, J., &Dobkin, B. H. (1988). Phonagnosia: A dissociation between familiar and unfamiliar voices.Cortex,24, 195–209.

    PubMed  Google Scholar 

  92. Van Lancker, D., &Kreiman, J. (1987). Voice discrimination and recognition are separate abilities.Neuropsychologia,25, 829–854.

    Article  PubMed  Google Scholar 

  93. Van Lancker, D., Kreiman, J., &Emmorey, K. (1985). Familiar voice recognition: Patterns and parameters: Part I. Recognition of backward voices.Journal of Phonetics,13, 19–38.

    Google Scholar 

  94. Van Lancker, P., Kreiman, J., &Wickens, T. (1985). Familiar voice recognition: Patterns and parameters. Part II. Recognition of ratealtered voices.Journal of Phonetics,13, 39–52.

    Google Scholar 

  95. Verbrugge, R. R., Strange, W., Shankweiler, D. P., &Edman, T. R. (1976). What information enables a listener to map a talker’s vowel space?Journal of the Acoustical Society of America,60, 198–212.

    Article  PubMed  Google Scholar 

  96. Weenink, D. J. M. (1986). The identification of vowel stimuli from men, women, and children.Proceedings from the Institute of Phonetic Sciences of the University of Amsterdam,10, 41–54.

    Google Scholar 

  97. Williams, C. E. (1964). The effects of selected factors on the aural identification of speakers. InReport EDS-TDR-65-153 (Section III). Hanscom Field, MA: Air Force Systems Command, Electronic Systems Division.

    Google Scholar 

  98. Wohlwill, J. F. (1958). The definition and analysis of perceptual learning.Psychological Review,65, 283–295.

    Article  PubMed  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Lynne C. Nygaard.

Additional information

This research was supported by NIDCD Research Grant DC-00111 and NIDCD Research Training Grant DC-00012 to Indiana University. Portions of this research were presented at the 125th meeting of the Acoustical Society of America in Ottawa and at the XHIth International Congress of Phonetic Sciences in Stockholm.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Nygaard, L.C., Pisoni, D.B. Talker-specific learning in speech perception. Perception & Psychophysics 60, 355–376 (1998). https://doi.org/10.3758/BF03206860

Download citation

Keywords

  • Speech Signal
  • Speech Perception
  • Perceptual Learning
  • Serial Recall
  • Generalization Test