Speech alignment is the tendency for interlocutors to unconsciously imitate one another’s speaking style. Alignment also occurs when a talker is asked to shadow recorded words (e.g., Shockley, Sabadini, & Fowler, 2004). In two experiments, we examined whether alignment could be induced with visual (lipread) speech and with auditory speech. In Experiment 1, we asked subjects to lipread and shadow out loud a model silently uttering words. The results indicate that shadowed utterances sounded more similar to the model’s utterances than did subjects’ nonshadowed read utterances. This suggests that speech alignment can be based on visual speech. In Experiment 2, we tested whether raters could perceive alignment across modalities. Raters were asked to judge the relative similarity between a model’s visual (silent video) utterance and subjects’ audio utterances. The subjects’ shadowed utterances were again judged as more similar to the model’s than were read utterances, suggesting that raters are sensitive to cross-modal similarity between aligned words.
Arnold, P., & Hill, F. (2001). Bisensory augmentation: A speechreading advantage when speech is clearly audible and intact. British Journal of Psychology, 92, 339–355.
Calvert, G. A., Bullmore, E., Brammer, M. J., Campbell, R., Iversen, S. D., Woodruff, P., et al. (1997). Silent lipreading activates the auditory cortex. Science, 276, 593–596.
Chartrand, T. L., & Bargh, J. A. (1999). The chameleon effect: The perception-behavior link and social interaction. Journal of Personality & Social Psychology, 76, 893–910.
Davis, C., & Kim, J. (2001). Repeating and remembering foreign language words: Implications for language teaching systems. Artificial Intelligence Review, 16, 37–47.
Fadiga, L., Craighero, L., Buccino, G., & Rizzolatti, G. (2002). Speech listening specifically modulates the excitability of tongue muscles: A TMS study. European Journal of Neuroscience, 15, 399–402.
Fowler, C. A. (2004). Speech as a supermodal or amodal phenomenon. In G. A. Calvert, C. Spence, & B. E. Stein (Eds.), The handbook of multisensory processing (pp. 189–201). Cambridge, MA: MIT Press.
Fowler, C. A., Brown, J. M., Sabadini, L., & Weihing, J. (2003). Rapid access to speech gestures in perception: Evidence from choice and simple response time tasks. Journal of Memory & Language, 49, 396–413.
Gentilucci, M., & Bernardis, P. (2007). Imitation during phoneme production. Neuropsychologia, 45, 608–615.
Giles, H., Coupland, N., & Coupland, J. (1991). Accommodation theory: Communication, context, and consequences. In H. Giles, N. Coupland, & J. Coupland (Eds.), Contexts of accommodation: Developments in applied sociolinguistics (pp. 1–68). Cambridge: Cambridge University Press.
Goldinger, S. D. (1998). Echoes of echoes? An episodic theory of lexical access. Psychological Review, 105, 251–279.
Goldinger, S. D., & Azuma, T. (2004). Episodic memory reflected in printed word naming. Psychonomic Bulletin, 11, 716–722.
Grant, K. W., & Seitz, P. F. (2000). The use of visible speech cues for improving auditory detection of spoken sentences. Journal of the Acoustical Society of America, 108, 1197–1208.
Gregory, S. W. (1990). Analysis of fundamental frequency reveals covariation in interview partners’ speech. Journal of Nonverbal Behavior, 14, 237–251.
Kamachi, M., Hill, H., Lander, K., & Vatikiotis-Bateson, E. (2003). “Putting the face to the voice”: Matching identity across modality. Current Biology, 13, 1709–1714.
Kaufmann, J. M., & Schweinberger, S. R. (2005). Speaker variations influence speechreading speed for dynamic faces. Perception, 34, 595–610.
Kerzel, D., & Bekkering, H. (2000). Motor activation from visible speech: Evidence from stimulus-response compatibility. Journal of Experimental Psychology: Human Perception & Performance, 26, 634–647.
Kozhevnikov, V., & Chistovich, L. (1965). Speech: Articulation and perception (JPRS Publication 50, 543). Washington, DC: Joint Publications Research Service.
Kuĉera, H., & Francis, W. (1967). Computational analysis of presentday American English. Providence, RI: Brown University Press.
Lachs, L., & Pisoni, D. B. (2004a). Crossmodal source identification in speech perception. Ecological Psychology, 16, 159–187.
Lachs, L., & Pisoni, D. B. (2004b). Cross-modal source information and spoken word recognition. Journal of Experimental Psychology: Human Perception & Performance, 30, 378–396.
Lachs, L., & Pisoni, D. B. (2004c). Specification of cross-modal source information in isolated kinematic displays of speech. Journal of the Acoustical Society of America, 116, 507–518.
Lander, K., & Davies, R. (2008). Does face familiarity influence speechreadability? Quarterly Journal of Experimental Psychology, 61, 961–967.
MacSweeney, M., Amaro, E., Calvert, G. A., Campbell, R., David, A. S., McGuire, P., et al. (2000). Silent speechreading in the absence of scanner noise: An event-related fMRI study. NeuroReport, 11, 1729–1733.
MacSweeney, M., Calvert, G. A., Campbell, R., McGuire, P. K., David, A. S., Williams, S. C. R., et al. (2002). Speechreading circuits in people born deaf. Neuropsychologia, 40, 801–807.
McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746–748.
Meltzoff, A. N., & Moore, M. K. (1997). Explaining facial imitation: A theoretical model. Early Development & Parenting, 6, 179–192.
Mills, A. E. (1987). The development of phonology in the blind child. In B. Dodd & R. Campbell (Eds.), Hearing by eye: The psychology of lip-reading (pp. 145–162). Hillsdale, NJ: Erlbaum.
Nakamura, M., Iwano, K., & Furui, S. (2008). Differences between acoustic characteristics of spontaneous and read speech and their of effects on recognition performance. Computer Speech & Language, 22, 171–184.
Namy, L. L., Nygaard, L. C., & Sauerteig, D. (2002). Gender differences in vocal accommodation: The role of perception. Journal of Language & Social Psychology, 21, 422–432.
Natale, M. (1975). Convergence of mean vocal intensity in dyadic communication as a function of social desirability. Journal of Personality & Social Psychology, 32, 790–804.
Navarra, J., & Soto-Faraco, S. (2007). Hearing lips in a second language: Visual articulatory information enables the perception of L2 sounds. Psychological Research, 71, 4–12.
Nygaard, L. C. (2005). The integration of linguistic and non-linguistic properties of speech. In D. Pisoni & R. Remez (Eds.), Handbook of speech perception (pp. 390–414). Malden, MA: Blackwell.
Pardo, J. S. (2004). Acoustic-phonetic convergence among interacting talkers. Journal of the Acoustical Society of America, 115, 2608.
Pardo, J. S. (2006). On phonetic convergence during conversational interaction. Journal of the Acoustical Society of America, 119, 2382–2393.
Pardo, J. S., & Remez, R. E. (2006). The perception of speech. In M. Traxler & M. A. Gernsbacher (Eds.), The handbook of psycholinguistics (2nd ed., pp. 201–248). New York: Academic Press.
Porter, R. J., Jr., & Castellanos, F. X. (1980). Speech production measures of speech perception: Rapid shadowing of VCV syllables. Journal of the Acoustical Society of America, 67, 1349–1356.
Porter, R. J., Jr., & Lubker, J. F. (1980). Rapid reproduction of vowel-vowel sequences: Evidence for a fast and direct acoustic-motoric linkage. Journal of Speech & Hearing Research, 23, 593–602.
Reisberg, D., McLean, J., & Goldfield, A. (1987). Easy to hear but hard to understand: A lip-reading advantage with intact auditory stimuli. In B. Dodd & R. Campbell (Eds.), Hearing by eye: The psychology of lip-reading (pp. 97–114). Hillsdale, NJ: Erlbaum.
Rosenblum, L. D. (2005). The primacy of multimodal speech perception. In D. Pisoni & R. Remez (Eds.), Handbook of speech perception (pp. 51–78). Malden, MA: Blackwell.
Rosenblum, L. D., Miller, R. M., & Sanchez, K. (2007). Lip-read me now, hear me better later: Cross-modal transfer of talker-familiarity effects. Psychological Science, 18, 392–396.
Rosenblum, L. D., Niehus, R. P., & Smith, N. M. (2007). Look who’s talking: Recognizing friends from visible articulation. Perception, 36, 157–159.
Rosenblum, L. D., Smith, N. M., Nichols, S. M., Hale, S., & Lee, J. (2006). Hearing a face: Cross-modal speaker matching using isolated visible speech. Perception & Psychophysics, 68, 84–93.
Rosenblum, L. D., Yakel, D. A., Baseer, N., Panchal, A., Nordarse, B. C., & Niehus, R. P. (2002). Visual speech information for face recognition. Perception & Psychophysics, 64, 220–229.
Sancier, M. L., & Fowler, C. A. (1997). Gestural drift in a bilingual speaker of Brazilian Portuguese and English. Journal of Phonetics, 25, 421–436.
Schweinberger, S. R., & Soukup, G. R. (1998). Asymmetric relationships among perceptions of facial identity, emotion, and facial speech. Journal of Experimental Psychology: Human Perception & Performance, 24, 1748–1765.
Sheffert, S. M., & Fowler, C. A. (1995). The effects of voice and visible speaker change on memory for spoken words. Journal of Memory & Language, 34, 665–685.
Sheffert, S. M., & Olson, E. (2004). Audiovisual speech facilitates voice learning. Perception & Psychophysics, 66, 352–362.
Shockley, K., Sabadini, L., & Fowler, C. A. (2004). Imitation in shadowing words. Perception & Psychophysics, 66, 422–429.
Shockley, K., Santana, M. V., & Fowler, C. A. (2003). Mutual interpersonal postural constraints are involved in cooperative conversation. Journal of Experimental Psychology: Human Perception & Performance, 29, 326–332.
Sumby, W. H., & Pollack, I. (1954). Visual contribution to speech intelligibility in noise. Journal of the Acoustical Society of America, 26, 212–215.
Sundara, M., Namasivayam, A. K., & Chen, R. (2001). Observation-execution matching system for speech: A magnetic stimulation study. NeuroReport, 12, 1341–1344.
Thalheimer, W., & Cook, S. (2002, August). How to calculate effect sizes from published research articles: A simplified methodology. Retrieved November 31, 2002, from http://work-learning.com/ effect_sizes.htm.
Yakel, D. A., Rosenblum, L. D., & Fortier, M. A. (2000). Effects of talker variability on speechreading. Perception & Psychophysics, 62, 1405–1412.
This research was supported by NIDCD Grant 1R01DC008957-01.
About this article
Cite this article
Miller, R.M., Sanchez, K. & Rosenblum, L.D. Alignment to visual speech information. Attention, Perception, & Psychophysics 72, 1614–1625 (2010). https://doi.org/10.3758/APP.72.6.1614
- Visual Speech
- Auditory Speech
- Audio Video
- Visual Speech Information
- Versus ISUAL