Automatic audiovisual integration in speech perception
- 397 Downloads
Two experiments aimed to determine whether features of both the visual and acoustical inputs are always merged into the perceived representation of speech and whether this audiovisual integration is based on either cross-modal binding functions or on imitation. In a McGurk paradigm, observers were required to repeat aloud a string of phonemes uttered by an actor (acoustical presentation of phonemic string) whose mouth, in contrast, mimicked pronunciation of a different string (visual presentation). In a control experiment participants read the same printed strings of letters. This condition aimed to analyze the pattern of voice and the lip kinematics controlling for imitation. In the control experiment and in the congruent audiovisual presentation, i.e. when the articulation mouth gestures were congruent with the emission of the string of phones, the voice spectrum and the lip kinematics varied according to the pronounced strings of phonemes. In the McGurk paradigm the participants were unaware of the incongruence between visual and acoustical stimuli. The acoustical analysis of the participants’ spoken responses showed three distinct patterns: the fusion of the two stimuli (the McGurk effect), repetition of the acoustically presented string of phonemes, and, less frequently, of the string of phonemes corresponding to the mouth gestures mimicked by the actor. However, the analysis of the latter two responses showed that the formant 2 of the participants’ voice spectra always differed from the value recorded in the congruent audiovisual presentation. It approached the value of the formant 2 of the string of phonemes presented in the other modality, which was apparently ignored. The lip kinematics of the participants repeating the string of phonemes acoustically presented were influenced by the observation of the lip movements mimicked by the actor, but only when pronouncing a labial consonant. The data are discussed in favor of the hypothesis that features of both the visual and acoustical inputs always contribute to the representation of a string of phonemes and that cross-modal integration occurs by extracting mouth articulation features peculiar for the pronunciation of that string of phonemes.
KeywordsMcGurk effect Audiovisual integration Voice spectrum analysis Lip kinematics Imitation
We whish to thank Paola Santunione and Andrea Candiani for the help in carrying out the experiments and Dr. Cinzia Di Dio for the comments on the manuscript. The work was supported by grant from MIUR (Ministero dell’Istruzione, dell’Università e della Ricerca) to M.G.
- Campbell R, MacSweeney M, Surguladze S, Calvert GA, McGuire PK, Brammer MJ, David AS, Suckling J (2001) Cortical substrates for the perception of face actions: an fMRI study of the specificity of activation for seen speech and for meaningless lower face acts (gurnings). Cogn Brain Res 12:233–243CrossRefGoogle Scholar
- Ferrero F, Genre A, Boë LJ Contini M (1979) Nozioni di fonetica acustica. Edizioni Omega,TorinoGoogle Scholar
- Leoni FA, Maturi P (2002) Manuale di Fonetica. Carocci, RomaGoogle Scholar
- Massaro DW (1998) Perceiving talking faces: from speech perception to behavioral principle. MIT press, Cambrige, MAGoogle Scholar
- Meltzoff AN (2002) Elements of a developmental theory of imitation. In: Meltzoff AN, Prinz W (eds) The imitative mind: development, evolution, and brain bases. Cambridge University Press, New York, pp 74–84Google Scholar
- Munhall KG, Vatikiotis-Bateson E (1998) The moving face during speech communication. In: Campbell R, Dodd B, Burnham D (eds) Hearing by eye II: advances in the psychology of speechreading and auditory-visual speech. Psychology, Hove UK, pp 123–139Google Scholar
- Reisberg D, McLean J, Goldfield A (1987) Easy to hear but not to understand: a lipreading advantage with intact auditory stimuli. In Dodd B, Campbell R (eds) Hearing by eye: the psychology of lip-reading. Erlbaum, Hillsdale NJ, pp 97–113Google Scholar
- Sekiyama K, Tohkura Y (1993) Inter-language differences in the influence of visual cues in speech perception. J Phonetics 21:427–444Google Scholar