Abstract
The effects of perceptual learning of talker identity on the recognition of spoken words and sentences were investigated in three experiments. In each experiment, listeners were trained to learn a set of 10 talkers’ voices and were then given an intelligibility test to assess the influence of learning the voices on the processing of the linguistic content of speech. In the first experiment, listeners learned voices from isolated words and were then tested with novel isolated words mixed in noise. The results showed that listeners who were given words produced by familiar talkers at test showed better identification performance than did listeners who were given words produced by unfamiliar talkers. In the second experiment, listeners learned novel voices from sentence-length utterances and were then presented with isolated words. The results showed that learning a talker’s voice from sentences did not generalize well to identification of novel isolated words. In the third experiment, listeners learned voices from sentence-length utterances and were then given sentence-length utterances produced by familiar and unfamiliar talkers at test. We found that perceptual learning of novel voices from sentence-length utterances improved speech intelligibility for words in sentences. Generalization and transfer from voice learning to linguistic processing was found to be sensitive to the talker-specific information available during learning and test. These findings demonstrate that increased sensitivity to talker-specific information affects the perception of the linguistic properties of speech in isolated words and sentences.
Article PDF
Similar content being viewed by others
References
Abercrombie, D. (1967).Elements of general phonetics. Chicago: Aldine.
Assmann, P. F., Nearey, T. M., &Hogan, J. T. (1982). Vowel identification: Orthographic, perceptual, and acoustic aspects.Journal of the Acoustical Society of America,71, 975–989.
Bradlow, A. R., Nygaard, L. C., &Pisoni, D. B. (1995). On the contribution of in-tance-specific characteristics to speech perception. In C. Sorin, J. Mariani, H. Meloni, & J. Schoentagen (Eds.),Levels in speech communication: Relations and interactions (pp. 13–24). Amsterdam: Elsevier.
Bradlow, A. R., Torretta, G. M., &Pisoni, D. B. (1996). Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics.Speech Communication,20, 255–272.
Bricker, P. D., &Pruzansky, S. (1976). Speaker recognition. In N. J. Lass (Ed.),Contemporary issues in experimental phonetics (pp. 295–326). New York: Academic Press.
Brooks, L. (1978). Nonanalytic concept formation and memory for instances. In E. Rosch & B. Lloyd (Eds.),Cognition and categorization (pp. 169–211). Hillsdale, NJ: Erlbaum.
Church, B. A., &Schacter, D. L. (1994). Perceptual specificity of auditory priming: Implicit memory for voice intonation and fundamental frequency.Journal of Experimental Psychology: Learning, Memory, & Cognition,20, 521–533.
Cole, R. A., Coltheart, M., &Allard, F. (1974). Memory of a speaker’s voice: Reaction time to same- or different-voiced letters.Quarterly Journal of Experimental Psychology,26, 1–7.
Costanzo, F. S., Markel, N. N., &Costanzo, P. R. (1989). Voice quality profile and perceived emotion.Journal of Counseling Psychology,16, 267–270.
Craik, F. I. M., &Kirsner, K. (1974). The effect of speaker’s voice on word recognition.Quarterly Journal of Experimental Psychology,26, 274–284.
Creelman, C. D. (1957). The case of the unknown talker.Journal of the Acoustical Society of America,29, 655.
Doddington, G. R. (1985). Speaker recognition: Identifying people by their voices.Proceedings of the IEEE,73, 1651–1664.
Dupoux, E., &Green, K. (1997). Perceptual adjustment to highly compressed speech: Effects of talker and rate changes.Journal of Experimental Psychology: Human Perception & Performance,23, 914–927.
Egan, J. P. (1948). Articulation testing methods.Laryngoscope,58, 955–991.
Fant, G. (1973).Speech sounds and features. Cambridge, MA: MIT Press.
Fodor, J. A. (1983).The modularity of mind. Cambridge, MA: MIT Press.
Fowler, C. A. (1986). An event approach to the study of speech perception from a direct-realist perspective.Journal of Phonetics,14, 3–28.
Garfield, J. L. (1987). Introduction: Carving the mind at its joints. In J. L. Garfield (Ed.),Modularity in knowledge representation and natural-language understanding (pp. 17–23). Cambridge, MA: MIT Press.
Garner, W. (1974).The processing of information and structure. Hillsdale, NJ: Erlbaum.
Garvin, P. L., &Ladefoged, P. L. (1963). Speaker identification and message identification in speech recognition.Phonetica,9, 193–199.
Geiselman, R. E. (1979). Inhibition of the automatic storage of speaker’s voice.Memory & Cognition,7, 201–204.
Geiselman, R. E., &Bellezza, F. S. (1976). Long-term memory for speaker’s voice and source location.Memory & Cognition,4, 483–489.
Geiselman, R. E., &Bellezza, F. S. (1977). Incidental retention of speaker’s voice.Memory & Cognition,5, 658–665.
Geiselman, R. E., &Crawley, J. M. (1983). Incidental processing of speaker characteristics: Voice as connotative information.Journal of Verbal Learning & Verbal Behavior,22, 15–23.
Gibson, E. J. (1969).Principles of perceptual learning and development. New York: Appleton-Century-Crofts.
Gibson, E. J. (1991).An Odyssey in learning and perception. Cambridge, MA: MIT Press.
Gibson, J. J., &Gibson, E. J. (1955). Perceptual learning: Differentiation or enrichment?Psychological Review,62, 32–41.
Goldinger, S. D. (1992).Words and voices: Implicit and explicit memory for spoken words (Research on Speech Perception Tech. Rep. No. 7). Bloomington: Indiana University, Department of Psychology.
Goldinger, S. D. (1996). Words and voices: Episodic traces in spoken word identification and recognition memory.Journal of Experimental Psychology: Learning, Memory, & Cognition,22, 1166–1183.
Goldinger, S. D. Pisoni, D. B., &Logan, D. B. (1991). The nature of talker variability effects on recall of spoken word lists.Journal of Experimental Psychology: Learning, Memory, & Cognition,17, 152–162.
Goldstone, R. (1994). Influences of categorization on perceptual discrimination.Journal of Experimental Psychology: General,123, 178–200.
Greenspan, S. L., Nusbaum, H. C., &Pisoni, D. B. (1988). Perceptual learning of synthetic speech produced by rule.Journal of Experimental Psychology: Learning, Memory, & Cognition,14, 421–433.
Hall, G. (1991).Perceptual and associative learning. Oxford: Oxford University Press, Clarendon Press.
Halle, M. (1985). Speculations about the representation of words in memory. In V. A. Fromkin (Ed.),Phonetic linguistics (pp. 101–114). New York: Academic Press.
Hintzman, D. L. (1986). “Schema abstraction” in a multiple trace memory model.Psychological Review,93, 411–428.
House, A. S., Williams, C. E., Hecker, M. H. L., &Kryter, K. D. (1965). Articulation-testing methods: Consonantal differentiation with a closed-response set.Journal of the Acoustical Society of America,37, 158–166.
Institute of Electrical and Electronics Engineers (1969).IEEE recommended practice for speech quality measurements (IEEE Report No. 297). New York: Author.
Jacoby, L. L., &Brooks, L. R. (1984). Nonanalytic cognition: Memory, perception, and concept learning. In G. H. Bower (Ed.),The psychology of learning and motivation (Vol. 18, pp. 1–47). New York: Academic Press.
Johnson, K. (1990). The role of perceived speaker identity inF0 normalization of vowels.Journal of the Acoustical Society of America,88, 642–654.
Joos, M. A. (1948). Acoustic phonetics.Language,24(Suppl. 2), 1–136.
Kolers, P. A. (1976). Pattern analyzing memory.Science,191, 1280–1281.
Kolers, P. A., &Ostry, D. J. (1974). Time course of loss of information regarding pattern analyzing operations.Journal of Verbal Learning & Verbal Behavior,13, 599–612.
Kuhl, P. K. (1991). Human adults and human infants show a “perceptual magnet effect” for the prototypes of speech categories, monkeys do not.Perception & Psychophysics,50, 93–107.
Kuhl, P. K. (1992). Psychoacoustics and speech perception: Internal standards, perceptual anchors, and prototypes. In L. A. Werner & E. W. Rubel (Eds.),Developmental psychoacoustics (pp. 293–332). Washington, DC: APA Press.
Labov, W. (1972).Sociolinguisticpatterns. Philadelphia: University of Pennsylvania Press.
Ladefoged, P. (1980). What are linguistic sounds made of?Language,56, 485–502.
Ladefoged, P., &Broadbent, D. E. (1957). Information conveyed by vowels.Journal of the Acoustical Society of America,29, 98–104.
Laver, J. (1989). Cognitive science and speech: A framework for research. In H. Schnelle & N. O. Bernsen (Eds.),Logic and linguistics: Research directions in cognitive science. European perspectives (Vol. 2, pp. 37–70). Hillsdale, NJ: Erlbaum.
Laver, J., &Trudgill, P. (1979). Phonetic and linguistic markers in speech. In K. R. Scherer & H. Giles (Eds.),Social markers in speech (pp. 1–32). Cambridge: Cambridge University Press.
Lawrence, D. H. (1949). Acquired distinctiveness of cues: I. Transfer between discriminations on the basis of familiarity with the stimulus.Journal of Experimental Psychology,39, 770–784.
Legge, G. E., Grossmann, C., &Pieper, C. M. (1984). Learning unfamiliar voices.Journal of Experimental Psychology: Learning, Memory, & Cognition,10, 1–36.
Liberman, A. M., &Mattingly, I. G. (1985). The motor theory of speech perception revised.Cognition,21, 1–36.
Lightfoot, N. (1989).Effects of talker familiarity on serial recall of spoken word lists (Research on Speech Perception Progress Report No. 15). Bloomington: Indiana University, Department of Psychology.
Lively, S. E., Logan, J. S., &Pisoni, D. B. (1993). Training Japanese listeners to identify English Irl and III: II. The role of phonetic environment and talker variability in learning new perceptual categories.Journal of the Acoustical Society of America,94, 1242–1255.
Lively, S. E., Pisoni, D. B., Yamada, R. A., Tohkura, Y., &Yamada, T. (1994). Training Japanese listeners to identify English Irl and IM: III. Long-term retention of new phonetic categories.Journal of the Acoustical Society of America,96, 2076–2087.
Logan, J. S., Lively, S. E., &Pisoni, D. B. (1991). Training Japanese listeners to identify English Irl and HI: A first report.Journal of the Acoustical Society of America,89, 874–886.
Luce, P. A., Pisoni, D. B., &Goldinger, S. D. (1990). Similarity neighborhoods of spoken words. In G. T. M. Altmann (Ed.),Cognitive models of speech processing: Psycholinguistic and computational perspectives (pp. 122–147). Cambridge, MA: MIT Press.
Markel, N. N., Bein, M. F., &Phillis, J. (1973). The relationship between words and tone-of-voice.Language & Speech,16, 15–21.
Martin, C. S., Mullennix, J. W., Pisoni, D. B., &Summers, W. V. (1989). Effects of talker variability on recall of spoken word lists.Journal of Experimental Psychology: Learning, Memory, & Cognition,15, 676–681.
McClelland, J. L., &Elman, J. L. (1986). The TRACE model of speech perception.Cognitive Psychology,18, 1–86.
Miller, J. D. (1989). Auditory-perceptual interpretation of the vowel.Journal of the Acoustical Society of America,85, 2114–2134.
Mullennix, J. W., &Pisoni, D. B. (1990). Stimulus variability and processing dependencies in speech perception.Perception & Psychophysics,47, 379–390.
Mullennix, J. W., Pisoni, D. B., &Martin, C. S. (1989). Some effects of talker variability on spoken word recognition.Journal ofthe Acoustical Society of America,85, 365–378.
Murray, I. R., &Arnott, J. L. (1993). Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion.Journal of the Acoustical Society of America,93, 1097–1108.
Nearey, T. M. (1989). Static, dynamic, and relational properties in vowel perception.Journal of the Acoustical Society of America,85, 2088–2113.
Nosofsky, R. M. (1987). Attention and learning processes in the identification and categorization of integral stimuli.Journal of Experimental Psychology: Learning, Memory, & Cognition,15, 700–708.
Nusbaum, H. C., Pisoni, D. B., &Davis, D. K. (1984).Sizing up the Hoosier mental lexicon: Measuring the familiarity of 20,000 words (Research on Speech Perception Progress Report No. 10). Bloomington: Indiana University, Department of Psychology.
Nygaard, L. C., &Kalish, M. L. (1994). Modeling the effect of learning voices on the perception of speech.Journal of the Acoustical Society of America,95, 2873.
Nygaard, L. C., &Pisoni, D. B. (1995). Speech perception: New directions in research and theory. In J. L. Miller & P. D. Eimas (Eds.),Handbook of perception and cognition: Vol. II. Speech, language and communication (pp. 63–96). New York: Academic Press.
Nygaard, L. C., Sommers, M. S., &Pisoni, D. B. (1994). Speech perception as a talker-contingent process.Psychological Science,5, 42–46.
Nygaard, L. C., Sommers, M. S., &Pisoni, D. B. (1995). Effects of stimulus variability on perception and representation of spoken words in memory.Perception & Psychophysics,57, 989–1001.
Palmeri, T. J. Goldinger, S. D., &Pisoni, D. B. (1993). Episodic encoding of voice attributes and recognition memory for spoken words.Journal of Experimental Psychology: Learning, Memory, & Cognition,19, 309–328.
Peters, R. W. (1955a). The effect of length of exposure to speaker’s voice upon listener reception. InJoint Project Report No. 44 (pp. 1–8). Pensacola, FL: U.S. Naval School of Aviation Medicine.
Peters, R. W. (1955b). The relative intelligibility of single-voice and multiple-voice messages under various conditions of noise. InJoint Project Report No. 56 (pp. 1–9). Pensacola, FL: U.S. Naval School of Aviation Medicine.
Peterson, G. E., &Barney, H. L. (1952). Control methods used in a study of the vowels.Journal of the Acoustical Society of America,24, 175–184.
Pisoni, D. B. (1993). Long-term memory in speech perception: Some new findings on talker variability, speaking rate, and perceptual learning.Speech Communication,13, 109–125.
Pisoni, D. B. (1997). Some thoughts on “normalization” in speech perception. In K. Johnson & J. W. Mullennix (Eds.),Talker variability in speech processing (pp. 9–32). San Diego: Academic Press.
Pollack, I., Pickett, J. M., &Sumby, W. H. (1954). On the identification of speakers by voice.Journal of the Acoustical Society of America,26, 403–406.
Remez, R. E., Fellowes, J. M., &Rubin, P. E. (1997). Talker identification based on phonetic information.Journal of Experimental Psychology: Human Perception & Performance,23, 651–666.
Schacter, D. L. (1990). Perceptual representation systems and implicit memory: Toward a resolution of the multiple memory systems debate. In A. Diamond (Ed.),Development and neural bases of higher cortical functions (Annals of the New York Academy of Sciences, Vol. 608, pp. 543–571). New York: New York Academy of Sciences.
Schwab, E. C., Nusbaum, H. C., &Pisoni, D. B. (1985). Some effects of training on the perception of synthetic speech.Human Factors,27, 395–408.
Shankweiler, D. P., Strange, W., &Verbrugge, R. R. (1977). Speech and the problem of perceptual constancy. In R. Shaw & J. Bransford (Eds.),Perceiving, acting, and knowing: Toward an ecological psychology (pp. 315–345). Hillsdale, NJ: Erlbaum.
Shepard, R. N., &Teghtsoonian, M. (1961). Retention of information under conditions approaching a steady state.Journal of Experimental Psychology,62, 302–309.
Sommers, M. S., Nygaard, L. C., &Pisoni, D. B. (1994). Stimulus variability and spoken word recognition: I. Effects of variability in speaking rate and overall amplitude.Journal of the Acoustical Society of America,96, 1314–1324.
Stevens, K. N., &Blumstein, S. E. (1978). Invariant cues for place of articulation in stop consonants.Journal of the Acoustical Society of America,64, 1358–1368.
Strange, W., &Dittmann, S. (1984). Effects of discrimination training on the perception of /r-1/ by Japanese adults learning English.Perception & Psychophysics,36, 131–145.
Summerfield, Q. (1975). Acoustic and phonetic components of the influence of voice changes and identification times for CVC syllables. InReport on research in progress in speech perception (Vol. 2, pp. 73–98). Belfast, Northern Ireland: The Queen’s University of Belfast, Department of Psychology.
Summerfield, Q., &Haggard, M. P. (1973). Vocal tract normalization as demonstrated by reaction times. InReport of speech research in progress (Vol. 2, pp. 12–23). Belfast, Northern Ireland: The Queen’s University of Belfast.
Thompson, C. P. (1985). Voice identification: Speaker identifiability and a correction of the record regarding sex effects.Human Learning: Journal of Practical Research & Applications,4, 19–27.
Van Lancker, D. (1991). Personal relevance and the human right hemisphere.Brain & Cognition,17, 64–92.
Van Lancker, D., Cummings, J. L., Kreiman, J., &Dobkin, B. H. (1988). Phonagnosia: A dissociation between familiar and unfamiliar voices.Cortex,24, 195–209.
Van Lancker, D., &Kreiman, J. (1987). Voice discrimination and recognition are separate abilities.Neuropsychologia,25, 829–854.
Van Lancker, D., Kreiman, J., &Emmorey, K. (1985). Familiar voice recognition: Patterns and parameters: Part I. Recognition of backward voices.Journal of Phonetics,13, 19–38.
Van Lancker, P., Kreiman, J., &Wickens, T. (1985). Familiar voice recognition: Patterns and parameters. Part II. Recognition of ratealtered voices.Journal of Phonetics,13, 39–52.
Verbrugge, R. R., Strange, W., Shankweiler, D. P., &Edman, T. R. (1976). What information enables a listener to map a talker’s vowel space?Journal of the Acoustical Society of America,60, 198–212.
Weenink, D. J. M. (1986). The identification of vowel stimuli from men, women, and children.Proceedings from the Institute of Phonetic Sciences of the University of Amsterdam,10, 41–54.
Williams, C. E. (1964). The effects of selected factors on the aural identification of speakers. InReport EDS-TDR-65-153 (Section III). Hanscom Field, MA: Air Force Systems Command, Electronic Systems Division.
Wohlwill, J. F. (1958). The definition and analysis of perceptual learning.Psychological Review,65, 283–295.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was supported by NIDCD Research Grant DC-00111 and NIDCD Research Training Grant DC-00012 to Indiana University. Portions of this research were presented at the 125th meeting of the Acoustical Society of America in Ottawa and at the XHIth International Congress of Phonetic Sciences in Stockholm.
Rights and permissions
About this article
Cite this article
Nygaard, L.C., Pisoni, D.B. Talker-specific learning in speech perception. Perception & Psychophysics 60, 355–376 (1998). https://doi.org/10.3758/BF03206860
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.3758/BF03206860