Experimental Brain Research

, Volume 208, Issue 3, pp 447–457 | Cite as

Multistage audiovisual integration of speech: dissociating identification and detection

  • Kasper EskelundEmail author
  • Jyrki Tuomainen
  • Tobias S. Andersen
Research Article


Speech perception integrates auditory and visual information. This is evidenced by the McGurk illusion where seeing the talking face influences the auditory phonetic percept and by the audiovisual detection advantage where seeing the talking face influences the detectability of the acoustic speech signal. Here, we show that identification of phonetic content and detection can be dissociated as speech-specific and non-specific audiovisual integration effects. To this end, we employed synthetically modified stimuli, sine wave speech (SWS), which is an impoverished speech signal that only observers informed of its speech-like nature recognize as speech. While the McGurk illusion only occurred for informed observers, the audiovisual detection advantage occurred for naïve observers as well. This finding supports a multistage account of audiovisual integration of speech in which the many attributes of the audiovisual speech signal are integrated by separate integration processes.


Audiovisual speech perception Sine wave speech Multisensory integration Speech identification Speech detection McGurk illusion 


  1. Andersen TS, Mamassian P (2008) Audiovisual integration of stimulus transients. Vision Res 48:2537–2544CrossRefPubMedGoogle Scholar
  2. Andersen TS, Tiippana K, Laarni J, Kojo I, Sams M (2009) The role of visual spatial attention in audiovisual speech perception. Speech Commun 51:184–193CrossRefGoogle Scholar
  3. Arnal LH, Morillon B, Kell CA, Giraud AL (2009) Dual neural routing of visual facilitation in speech processing. J Neurosci 29:13445–13453CrossRefPubMedGoogle Scholar
  4. Bernstein LE, Auer ET Jr, Takayanagi S (2004) Auditory speech detection in noise is enhanced by lipreading. Speech Commun 44:5–18CrossRefGoogle Scholar
  5. Bertelson P (1999) Ventriloquism: a case of cross-modal perceptual grouping. In: Aschersleben G, Bachmann T, Müsseler J (eds) Cognitive contributions to the perception of spatial and temporal events. Elsevier, AmsterdamGoogle Scholar
  6. Besle J, Fort A, Delpuech C, Giard MH (2004) Bimodal speech: early suppressive visual effects in human auditory cortex. Eur J Neurosci 20:2225–2234CrossRefPubMedGoogle Scholar
  7. Bolognini N, Rasi F, Coccia M, Ladavas E (2005) Visual search improvement in hemianopic patients after audio-visual stimulation. Brain 128:2830–2842CrossRefPubMedGoogle Scholar
  8. Brainard DH (1997) The psychophysics toolbox. Spat Vis 10:433–436CrossRefPubMedGoogle Scholar
  9. Chandrasekaran C, Ghazanfar AA (2009) Different neural frequency bands integrate faces and voices differently in the superior temporal sulcus. J Neurophysiol 101:773–788CrossRefPubMedGoogle Scholar
  10. Chandrasekaran C, Trubanova A, Stillittano S, Caplier A, Ghazanfar AA (2009) The natural statistics of audiovisual speech. PLoS Comput Biol 5:e1000436CrossRefPubMedGoogle Scholar
  11. Colin C, Radeau M, Soquet A, Deltenre P (2004) Generalization of the generation of an MMN by illusory McGurk percepts: voiceless consonants. Clin Neurophysiol 115:1989–2000CrossRefPubMedGoogle Scholar
  12. de Gelder B, Vroomen J (2000) Bimodal emotion perception: integration across separate modalities, cross-modal perceptual grouping or perception of multimodal events? Cogn Emot 14:321–324CrossRefGoogle Scholar
  13. de Gelder B, Pourtois G, Weiskrantz L (2002) Fear recognition in the voice is modulated by unconsciously recognized facial expressions but not by unconsciously recognized affective pictures. Proc Natl Acad Sci USA 99:4121–4126CrossRefPubMedGoogle Scholar
  14. Frassinetti F, Bolognini N, Ladavas E (2002) Enhancement of visual perception by crossmodal visuo-auditory interaction. Exp Brain Res 147:332–343CrossRefPubMedGoogle Scholar
  15. Frassinetti F, Bolognini N, Bottari D, Bonora A, Ladavas E (2005) Audiovisual integration in patients with visual deficit. J Cogn Neurosci 17:1442–1452CrossRefPubMedGoogle Scholar
  16. Girard M, Perronet F (1999) Auditory-visual integration during multimodal object recognition in humans: a behavioral and electrophysiological study. J Cogn Neurosci 11:473–490CrossRefGoogle Scholar
  17. Gordon PC (1997) Coherence masking protection in speech sounds: the role of formant synchrony. Percept Psychophys 59:232–242PubMedGoogle Scholar
  18. Grant KW, Seitz PF (2000) The use of visible speech cues for improving auditory detection of spoken sentences. J Acoust Soc Am 108:1197–1208CrossRefPubMedGoogle Scholar
  19. Hickok G, Poeppel D (2004) Dorsal and ventral streams: a framework for understanding aspects of the functional anatomy of language. Cognition 92:67–99CrossRefPubMedGoogle Scholar
  20. Hickok G, Poeppel D (2007) The cortical organization of speech processing. Nat Rev Neurosci 8:393–402CrossRefPubMedGoogle Scholar
  21. Kim J, Davis C (2004) Investigating the audio-visual speech detection advantage. Speech Commun 44:19–30CrossRefGoogle Scholar
  22. Lakatos P, Chen CM, O’Connell MN, Mills A, Schroeder CE (2007) Neuronal oscillations and multisensory interaction in primary auditory cortex. Neuron 53:279–292CrossRefPubMedGoogle Scholar
  23. Leo F, Bolognini N, Passamonti C, Stein BE, Ladavas E (2008) Cross-modal localization in hemianopia: new insights on multisensory integration. Brain 131:855–865CrossRefPubMedGoogle Scholar
  24. Lovelace CT, Stein BE, Wallace MT (2003) An irrelevant light enhances auditory detection in humans: a psychophysical analysis of multisensory integration in stimulus detection. Brain Res Cogn Brain Res 17:447–453CrossRefPubMedGoogle Scholar
  25. McGrath M, Summerfield Q (1985) Intermodal timing relations and audio-visual speech recognition by normal-hearing adults. J Acoust Soc Am 77:678–685CrossRefPubMedGoogle Scholar
  26. McGurk H, MacDonald J (1976) Hearing lips and seeing voices. Nature 264:746–748CrossRefPubMedGoogle Scholar
  27. Miller LM, D’Esposito M (2005) Perceptual fusion and stimulus coincidence in the cross-modal integration of speech. J Neurosci 25:5884–5893CrossRefPubMedGoogle Scholar
  28. Möttönen R, Krause CM, Tiippana K, Sams M (2002) Processing of changes in visual speech in the human auditory cortex. Brain Res Cogn Brain Res 13:417–425CrossRefPubMedGoogle Scholar
  29. Munhall KG, Gribble P, Sacco L, Ward M (1996) Temporal constraints on the McGurk effect. Percept Psychophys 58:351–362PubMedGoogle Scholar
  30. Musacchia G, Sams M, Nicol T, Kraus N (2006) Seeing speech affects acoustic information processing in the human brainstem. Exp Brain Res 168:1–10CrossRefPubMedGoogle Scholar
  31. Pare M, Richler RC, ten Hove M, Munhall KG (2003) Gaze behavior in audiovisual speech perception: the influence of ocular fixations on the McGurk effect. Percept Psychophys 65:553–567PubMedCrossRefGoogle Scholar
  32. Pelli DG (1997) The VideoToolbox software for visual psychophysics: transforming numbers into movies. Spat Vis 10:437–442CrossRefPubMedGoogle Scholar
  33. Pilling M (2009) Auditory event-related potentials (ERPs) in audiovisual speech perception. J Speech Lang Hear Res 52:1073–1081CrossRefPubMedGoogle Scholar
  34. Poeppel D (2003) The analysis of speech in different temporal integration windows: cerebral lateralization as ‘asymmetric sampling in time’. Speech Commun 41:245–255Google Scholar
  35. Poeppel D, Idsardi WJ, van Wassenhove V (2008) Speech perception at the interface of neurobiology and linguistics. Philos Trans R Soc Lond B Biol Sci 363:1071–1086CrossRefPubMedGoogle Scholar
  36. Rauschecker JP, Scott SK (2009) Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing. Nat Neurosci 12:718–724CrossRefPubMedGoogle Scholar
  37. Remez RE, Rubin PE, Pisoni DB, Carrell TD (1981) Speech perception without traditional speech cues. Science 212:947–949CrossRefPubMedGoogle Scholar
  38. Sams M, Aulanko R, Hamalainen M, Hari R, Lounasmaa OV, Lu ST, Simola J (1991) Seeing speech: visual information from lip movements modifies activity in the human auditory cortex. Neurosci Lett 127:141–145CrossRefPubMedGoogle Scholar
  39. Schroeder CE, Lakatos P, Kajikawa Y, Partan S, Puce A (2008) Neuronal oscillations and visual amplification of speech. Trends Cogn Sci 12:106–113CrossRefPubMedGoogle Scholar
  40. Schwartz JL, Berthommier F, Savariaux C (2004) Seeing to hear better: evidence for early audio-visual interactions in speech identification. Cognition 93:B69–B78CrossRefPubMedGoogle Scholar
  41. Soto-Faraco S, Alsius A (2009) Deconstructing the McGurk–MacDonald illusion. J Exp Psychol Hum Percept Perform 35:580–587CrossRefPubMedGoogle Scholar
  42. Stekelenburg JJ, Vroomen J (2007) Neural correlates of multisensory integration of ecologically valid audiovisual events. J Cogn Neurosci 19:1964–1973CrossRefPubMedGoogle Scholar
  43. Sumby WH, Pollack I (1954) Visual contributions to speech intelligibility in noise. J Acoust Soc Am 28:212–215CrossRefGoogle Scholar
  44. Tiippana K, Andersen TS, Sams M (2004) Visual attention modulates audiovisual speech perception. Eur J Cogn Psychol 16:457–472CrossRefGoogle Scholar
  45. Tuomainen J, Andersen TS, Tiippana K, Sams M (2005) Audio-visual speech perception is special. Cognition 96:B13–B22CrossRefPubMedGoogle Scholar
  46. van Wassenhove V, Grant KW, Poeppel D (2005) Visual speech speeds up the neural processing of auditory speech. Proc Natl Acad Sci USA 102:1181–1186CrossRefPubMedGoogle Scholar
  47. van Wassenhove V, Grant KW, Poeppel D (2007) Temporal window of integration in auditory-visual speech perception. Neuropsychologia 45:598–607CrossRefPubMedGoogle Scholar
  48. Vatakis A, Ghazanfar AA, Spence C (2008) Facilitation of multisensory integration by the “unity effect” reveals that speech is special. J Vis 8:14:1–11Google Scholar
  49. Vroomen J, Baart M (2009) Phonetic recalibration only occurs in speech mode. Cognition 110:254–259CrossRefPubMedGoogle Scholar
  50. Vroomen J, Stekelenburg JJ (2010) Visual anticipatory information modulates multisensory interactions of artificial audiovisual stimuli. J Cogn Neurosci 22:1583–1596CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  • Kasper Eskelund
    • 1
    • 4
    Email author
  • Jyrki Tuomainen
    • 2
    • 3
  • Tobias S. Andersen
    • 4
  1. 1.Center for Visual Cognition, Department of PsychologyUniversity of CopenhagenCopenhagen KDenmark
  2. 2.Department of LogopedicsÅbo AkademiÅboFinland
  3. 3.Speech Hearing and Phonetic Sciences, Division of Psychology and Language SciencesUniversity College LondonLondonUK
  4. 4.Cognitive Systems, Department of Informatics and Mathematical Modeling, Building 321Technical University of DenmarkLyngbyDenmark

Personalised recommendations