Abstract
Audiovisual integration (AVI) is well-known during speech perception, but evidence for AVI in speaker identification has been less clear. This chapter reviews evidence for face–voice integration in speaker identification. Links between perceptual representations mediating face and voice identification, tentatively suggested by behavioral evidence more than a decade ago, have been recently supported by neuroimaging data indicating tight functional connectivity between the fusiform face and temporal voice areas. Research that recombined dynamic facial and vocal identities with precise synchrony provided strong evidence for AVI in identifying personally familiar (but not unfamiliar) speakers. Electrophysiological data demonstrate AVI at multiple neuronal levels and suggest that perceiving time-synchronized speaking faces triggers early (∼50–80 ms) audiovisual processing, although audiovisual speaker identity is only computed ∼200 ms later.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Like many other studies, it needs to be noted that this experiment used static faces. On the one hand, the study is therefore subject to the limitations mentioned earlier; on the other hand, this might be further evidence that even static faces can elicit some crossmodal effects (Joassin, Maurage, Bruyer, Crommelinck, & Campanella, 2004; Joassin et al., 2011).
- 2.
It could be speculated whether differences in timing might have been a consequence of the use of temporally extended sentence stimuli in Schweinberger, Kloth, and Robertson (2011) and Schweinberger, Walther, Zäske, and Kovacs (2011). However, in as yet unpublished research, we have now repeated the same experiment using brief syllabic stimuli similar to those used in the McGurk-paradigm, and replicated the crucial results, in terms of an early frontocentral negativity around 50–80 ms to bimodal stimuli, and an onset of speaker identity correspondence effects around 250 ms.
References
Andics, A., McQueen, J. M., Petersson, K. M., Gal, V., Rudas, G., & Vidnyanszky, Z. (2010). Neural mechanisms for voice recognition. NeuroImage, 52, 1528–1540.
Belin, P., Bestelmeyer, P. E. G., Latinus, M., & Watson, R. (2011). Understanding voice perception. British Journal of Psychology, 102, 711–725.
Belin, P., Fecteau, S., & Bedard, C. (2004). Thinking the voice: Neural correlates of voice perception. Trends in Cognitive Sciences, 8, 129–135.
Belin, P., & Zatorre, R. J. (2003). Adaptation to speaker’s voice in right anterior temporal lobe. NeuroReport, 14, 2105–2109.
Belin, P., Zatorre, R. J., Lafaille, P., Ahad, P., & Pike, B. (2000). Voice-selective areas in human auditory cortex. Nature, 403, 309–312.
Benson, P. J., & Perrett, D. I. (1991). Perception and recognition of photographic quality facial caricatures: Implications for the recognition of natural images. European Journal of Cognitive Psychology, 3, 105–135.
Bricker, P. D., & Pruzansky, S. (1966). Effects of stimulus content and duration on talker identification. Journal of the Acoustical Society of America, 40, 1441–1449.
Bruce, V., & Young, A. (1986). Understanding face recognition. British Journal of Psychology, 77, 305–327.
Bruce, V., & Young, A. (2011). Face perception. Hove, UK: Psychology Press.
Burton, A. M., Bruce, V., & Johnston, R. A. (1990). Understanding face recognition with an interactive activation model. British Journal of Psychology, 81, 361–380.
Calvert, G. A., Brammer, M. J., & Iversen, S. D. (1998). Crossmodal identification. Trends in Cognitive Sciences, 2, 247–253.
Campanella, S., & Belin, P. (2007). Integrating face and voice in person perception. Trends in Cognitive Sciences, 11, 535–543.
Charest, I., Pernet, C. R., Rousselet, G. A., Quinones, I., Latinus, M., Fillion-Bilodeau, S., et al. (2009). Electrophysiological evidence for an early processing of human voices. BMC Neuroscience, 10(127), 1–11.
Colonius, H., Diederich, A., & Steenken, R. (2009). Time-Window-of-Integration (TWIN) model for saccadic reaction time: Effect of auditory masker level on visual-auditory spatial interaction in elevation. Brain Topography, 21, 177–184.
de Gelder, B., & Vroomen, J. (2000). The perception of emotions by ear and by eye. Cognition & Emotion, 14, 289–311.
Ellis, H. D., Jones, D. M., & Mosdell, N. (1997). Intra- and inter-modal repetition priming of familiar faces and voices. British Journal of Psychology, 88, 143–156.
Formisano, E., De Martino, F., Bonte, M., & Goebel, R. (2008). “Who” Is Saying “What”? Brain-based decoding of human voice and speech. Science, 322, 970–973.
Fox, C. J., & Barton, J. J. S. (2007). What is adapted in face adaptation? The neural representations of expression in the human visual system. Brain Research, 1127, 80–89.
Garrido, L., Eisner, F., McGettigan, C., Stewart, L., Sauter, D., Hanley, J. R., et al. (2009). Developmental phonagnosia: A selective deficit of vocal identity recognition. Neuropsychologia, 47, 123–131.
Ghazanfar, A. A., & Schroeder, C. E. (2006). Is neocortex essentially multisensory? Trends in Cognitive Sciences, 10, 278–285.
Green, K. P., Kuhl, P. K., Meltzoff, A. N., & Stevens, E. B. (1991). Integration speech information across talkers, gender, and sensory modality: Female faces and male voices in the McGurk effect. Perception & Psychophysics, 50, 524–536.
Hagan, C. C., Woods, W., Johnson, S., Calder, A. J., Green, G. G. R., & Young, A. W. (2009). MEG demonstrates a supra-additive response to facial and vocal emotion in the right superior temporal sulcus. Proceedings of the National Academy of Sciences of the United States of America, 106, 20010–20015.
Hanley, J. R., Smith, S. T., & Hadfield, J. (1998). I recognise you but I can’t place you: An investigation of familiar-only experiences during tests of voice and face recognition. Quarterly Journal of Experimental Psychology, 51A, 179–195.
Haxby, J. V., Hoffman, E. A., & Gobbini, M. I. (2000). The distributed human neural system for face perception. Trends in Cognitive Sciences, 4, 223–233.
Joassin, F., Maurage, P., Bruyer, R., Crommelinck, M., & Campanella, S. (2004). When audition alters vision: An event-related potential study of the cross-modal interactions between faces and voices. Neuroscience Letters, 369, 132–137.
Joassin, F., Pesenti, M., Maurage, P., Verreckt, E., Bruyer, R., & Campanella, S. (2011). Cross-modal interactions between human faces and voices involved in person recognition. Cortex, 47, 367–376.
Kawahara, H., & Matsui, H. (2003). Auditory morphing based on an elastic perceptual distance metric in an interference-free time-frequency representation. IEEE Proceedings of ICASSP, 1, 256–259.
Kovács, G., Zimmer, M., Banko, E., Harza, I., Antal, A., & Vidnyanszky, Z. (2006). Electrophysiological correlates of visual adaptation to faces and body parts in humans. Cerebral Cortex, 16, 742–753.
Lander, K., & Chuang, L. (2005). Why are moving faces easier to recognize? Visual Cognition, 12, 429–442.
Legge, G. E., Grossmann, C., & Pieper, C. M. (1984). Learning unfamiliar voices. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 298–303.
Leopold, D. A., O’Toole, A. J., Vetter, T., & Blanz, V. (2001). Prototype-referenced shape encoding revealed by high-level aftereffects. Nature Neuroscience, 4, 89–94.
McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746–748.
Munhall, K. G., Gribble, P., Sacco, L., & Ward, M. (1996). Temporal constraints on the McGurk effect. Perception & Psychophysics, 58, 351–362.
Natu, V., & O’Toole, A. J. (2011). The neural processing of familiar and unfamiliar faces: A review and synopsis. British Journal of Psychology, 102, 726–747.
Navarra, J., Vatakis, A., Zampini, M., Soto-Faraco, S., Humphreys, W., & Spence, C. (2005). Exposure to asynchronous audiovisual speech extends the temporal window for audiovisual integration. Cognitive Brain Research, 25, 499–507.
Neuner, F., & Schweinberger, S. R. (2000). Neuropsychological impairments in the recognition of faces, voices, and personal names. Brain and Cognition, 44, 342–366.
Pollack, I., Pickett, J. M., & Sumby, W. H. (1954). On the identification of speakers by voice. Journal of the Acoustical Society of America, 26, 403–406.
Robertson, D. M. C., & Schweinberger, S. R. (2010). The role of audiovisual asynchrony in person recognition. Quarterly Journal of Experimental Psychology, 63, 23–30.
Saint-Amour, D., De Sanctis, P., Molholm, S., Ritter, W., & Foxe, J. J. (2007). Seeing voices: High-density electrical mapping and source-analysis of the multisensory mismatch negativity evoked during the McGurk illusion. Neuropsychologia, 45, 587–597.
Sams, M., Aulanko, R., Hämalainen, M., Hari, R., Lounasmaa, O. V., Lu, S.-T., et al. (1991). Seeing speech: Visual information from lip movements modifies activity in the human auditory cortex. Neuroscience Letters, 127, 141–145.
Schweinberger, S. R. (1996). Recognizing people by faces, names, and voices: Psychophysiological and neuropsychological investigations. University of Konstanz: Habilitation Thesis.
Schweinberger, S. R. (2011). Neurophysiological correlates of face recognition. In A. J. Calder, G. Rhodes, M. H. Johnson, & J. V. Haxby (Eds.), The handbook of face perception (pp. 345–366). Oxford: Oxford University Press.
Schweinberger, S. R., Casper, C., Hauthal, N., Kaufmann, J. M., Kawahara, H., Kloth, N., et al. (2008). Auditory adaptation in voice perception. Current Biology, 18, 684–688.
Schweinberger, S. R., Herholz, A., & Sommer, W. (1997). Recognizing famous voices: Influence of stimulus duration and different types of retrieval cues. Journal of Speech, Language, and Hearing Research, 40, 453–463.
Schweinberger, S. R., Herholz, A., & Stief, V. (1997). Auditory long-term memory: Repetition priming of voice recognition. Quarterly Journal of Experimental Psychology, 50A, 498–517.
Schweinberger, S. R., Kloth, N., & Robertson, D. M. C. (2011). Hearing facial identities: Brain correlates of face-voice integration in person identification. Cortex, 47, 1026–1037.
Schweinberger, S. R., Pickering, E. C., Jentzsch, I., Burton, A. M., & Kaufmann, J. M. (2002). Event-related brain potential evidence for a response of inferior temporal cortex to familiar face repetitions. Cognitive Brain Research, 14, 398–409.
Schweinberger, S. R., Robertson, D., & Kaufmann, J. M. (2007). Hearing facial identities. Quarterly Journal of Experimental Psychology, 60, 1446–1456.
Schweinberger, S. R., Walther, C., Zäske, R., & Kovacs, G. (2011). Neural correlates of adaptation to voice identity. British Journal of Psychology, 102, 748–764.
Shah, N. J., Marshall, J. C., Zafiris, O., Schwab, A., Zilles, K., Markowitsch, H. J., et al. (2001). The neural correlates of person familiarity. A functional magnetic resonance imaging study with clinical implications. Brain, 124, 804–815.
Sheffert, S. M., & Olson, E. (2004). Audiovisual speech facilitates voice learning. Perception & Psychophysics, 66, 352–362.
Soto-Faraco, S., & Alsius, A. (2009). Deconstructing the McGurk–MacDonald illusion. Journal of Experimental Psychology: Human Perception and Performance, 35, 580–587.
Stein, B. E., & Stanford, T. R. (2008). Multisensory integration: Current issues from the perspective of the single neuron. Nature Reviews Neuroscience, 9, 255–266.
Stekelenburg, J. J., & Vroomen, J. (2007). Neural correlates of multisensory integration of ecologically valid audiovisual events. Journal of Cognitive Neuroscience, 19, 1964–1973.
Sugiura, M., Shah, N. J., Zilles, K., & Fink, G. R. (2005). Cortical representations of personally familiar objects and places: Functional organization of the human posterior cingulate cortex. Journal of Cognitive Neuroscience, 17, 183–198.
Summerfield, Q., MacLeod, A., McGrath, M., & Brooke, M. (1989). Lips, teeth, and the benefits of lipreading. In A. W. Young & H. D. Ellis (Eds.), Handbook of research on face processing (pp. 223–233). Amsterdam: North-Holland.
van Wassenhove, V., Grant, K. W., & Poeppel, D. (2005). Visual speech speeds up the neural processing of auditory speech. Proceedings of the National Academy of Sciences of the United States of America, 102, 1181–1186.
van Wassenhove, V., Grant, K. W., & Poeppel, D. (2007). Temporal window of integration in auditory-visual speech perception. Neuropsychologia, 45, 598–607.
VanLancker, D., & Kreiman, J. (1987). Voice discrimination and recognition are separate abilities. Neuropsychologia, 25, 829–834.
VanLancker, D., Kreiman, J., & Wickens, T. D. (1985). Familiar voice recognition: Patterns and parameters. Part II: Recognition of rate-altered voices. Journal of Phonetics, 13, 39–52.
von Kriegstein, K., Kleinschmidt, A., Sterzer, P., & Giraud, A. L. (2005). Interaction of face and voice areas during speaker recognition. Journal of Cognitive Neuroscience, 17, 367–376.
Walker, S., Bruce, V., & O’Malley, C. (1995). Facial identity and facial speech processing: Familiar faces and voices in the McGurk effect. Perception & Psychophysics, 57, 1124–1133.
Welch, R. B., & Warren, D. H. (1980). Immediate perceptual response to intersensory discrepancy. Psychological Bulletin, 88, 638–667.
Zäske, R., Schweinberger, S. R., & Kawahara, H. (2010). Voice aftereffects of adaptation to speaker identity. Hearing Research, 268, 38–45.
Acknowledgments
The author’s research is supported by grants from the Deutsche Forschungsgemeinschaft (Grants Schw 511/6-2 and Schw511/10-1) in the context of the DFG Research Unit Person Perception (FOR1097). I am very grateful to Romi Zäske for helpful comments on an earlier draft of this chapter.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this chapter
Cite this chapter
Schweinberger, S.R. (2013). Audiovisual Integration in Speaker Identification. In: Belin, P., Campanella, S., Ethofer, T. (eds) Integrating Face and Voice in Person Perception. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-3585-3_6
Download citation
DOI: https://doi.org/10.1007/978-1-4614-3585-3_6
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-3584-6
Online ISBN: 978-1-4614-3585-3
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)