Attention, Perception, & Psychophysics

, Volume 72, Issue 6, pp 1614–1625 | Cite as

Alignment to visual speech information

  • Rachel M. Miller
  • Kauyumari Sanchez
  • Lawrence D. RosenblumEmail author
Research Articles


Speech alignment is the tendency for interlocutors to unconsciously imitate one another’s speaking style. Alignment also occurs when a talker is asked to shadow recorded words (e.g., Shockley, Sabadini, & Fowler, 2004). In two experiments, we examined whether alignment could be induced with visual (lipread) speech and with auditory speech. In Experiment 1, we asked subjects to lipread and shadow out loud a model silently uttering words. The results indicate that shadowed utterances sounded more similar to the model’s utterances than did subjects’ nonshadowed read utterances. This suggests that speech alignment can be based on visual speech. In Experiment 2, we tested whether raters could perceive alignment across modalities. Raters were asked to judge the relative similarity between a model’s visual (silent video) utterance and subjects’ audio utterances. The subjects’ shadowed utterances were again judged as more similar to the model’s than were read utterances, suggesting that raters are sensitive to cross-modal similarity between aligned words.


Visual Speech Auditory Speech Audio Video Visual Speech Information Versus ISUAL 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Arnold, P., & Hill, F. (2001). Bisensory augmentation: A speechreading advantage when speech is clearly audible and intact. British Journal of Psychology, 92, 339–355.CrossRefGoogle Scholar
  2. Calvert, G. A., Bullmore, E., Brammer, M. J., Campbell, R., Iversen, S. D., Woodruff, P., et al. (1997). Silent lipreading activates the auditory cortex. Science, 276, 593–596.PubMedCrossRefGoogle Scholar
  3. Chartrand, T. L., & Bargh, J. A. (1999). The chameleon effect: The perception-behavior link and social interaction. Journal of Personality & Social Psychology, 76, 893–910.CrossRefGoogle Scholar
  4. Davis, C., & Kim, J. (2001). Repeating and remembering foreign language words: Implications for language teaching systems. Artificial Intelligence Review, 16, 37–47.CrossRefGoogle Scholar
  5. Fadiga, L., Craighero, L., Buccino, G., & Rizzolatti, G. (2002). Speech listening specifically modulates the excitability of tongue muscles: A TMS study. European Journal of Neuroscience, 15, 399–402.PubMedCrossRefGoogle Scholar
  6. Fowler, C. A. (2004). Speech as a supermodal or amodal phenomenon. In G. A. Calvert, C. Spence, & B. E. Stein (Eds.), The handbook of multisensory processing (pp. 189–201). Cambridge, MA: MIT Press.Google Scholar
  7. Fowler, C. A., Brown, J. M., Sabadini, L., & Weihing, J. (2003). Rapid access to speech gestures in perception: Evidence from choice and simple response time tasks. Journal of Memory & Language, 49, 396–413.CrossRefGoogle Scholar
  8. Gentilucci, M., & Bernardis, P. (2007). Imitation during phoneme production. Neuropsychologia, 45, 608–615.PubMedCrossRefGoogle Scholar
  9. Giles, H., Coupland, N., & Coupland, J. (1991). Accommodation theory: Communication, context, and consequences. In H. Giles, N. Coupland, & J. Coupland (Eds.), Contexts of accommodation: Developments in applied sociolinguistics (pp. 1–68). Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  10. Goldinger, S. D. (1998). Echoes of echoes? An episodic theory of lexical access. Psychological Review, 105, 251–279.PubMedCrossRefGoogle Scholar
  11. Goldinger, S. D., & Azuma, T. (2004). Episodic memory reflected in printed word naming. Psychonomic Bulletin, 11, 716–722.CrossRefGoogle Scholar
  12. Grant, K. W., & Seitz, P. F. (2000). The use of visible speech cues for improving auditory detection of spoken sentences. Journal of the Acoustical Society of America, 108, 1197–1208.PubMedCrossRefGoogle Scholar
  13. Gregory, S. W. (1990). Analysis of fundamental frequency reveals covariation in interview partners’ speech. Journal of Nonverbal Behavior, 14, 237–251.CrossRefGoogle Scholar
  14. Kamachi, M., Hill, H., Lander, K., & Vatikiotis-Bateson, E. (2003). “Putting the face to the voice”: Matching identity across modality. Current Biology, 13, 1709–1714.PubMedCrossRefGoogle Scholar
  15. Kaufmann, J. M., & Schweinberger, S. R. (2005). Speaker variations influence speechreading speed for dynamic faces. Perception, 34, 595–610.PubMedCrossRefGoogle Scholar
  16. Kerzel, D., & Bekkering, H. (2000). Motor activation from visible speech: Evidence from stimulus-response compatibility. Journal of Experimental Psychology: Human Perception & Performance, 26, 634–647.CrossRefGoogle Scholar
  17. Kozhevnikov, V., & Chistovich, L. (1965). Speech: Articulation and perception (JPRS Publication 50, 543). Washington, DC: Joint Publications Research Service.Google Scholar
  18. Kuĉera, H., & Francis, W. (1967). Computational analysis of presentday American English. Providence, RI: Brown University Press.Google Scholar
  19. Lachs, L., & Pisoni, D. B. (2004a). Crossmodal source identification in speech perception. Ecological Psychology, 16, 159–187.CrossRefGoogle Scholar
  20. Lachs, L., & Pisoni, D. B. (2004b). Cross-modal source information and spoken word recognition. Journal of Experimental Psychology: Human Perception & Performance, 30, 378–396.CrossRefGoogle Scholar
  21. Lachs, L., & Pisoni, D. B. (2004c). Specification of cross-modal source information in isolated kinematic displays of speech. Journal of the Acoustical Society of America, 116, 507–518.PubMedCrossRefGoogle Scholar
  22. Lander, K., & Davies, R. (2008). Does face familiarity influence speechreadability? Quarterly Journal of Experimental Psychology, 61, 961–967.CrossRefGoogle Scholar
  23. MacSweeney, M., Amaro, E., Calvert, G. A., Campbell, R., David, A. S., McGuire, P., et al. (2000). Silent speechreading in the absence of scanner noise: An event-related fMRI study. NeuroReport, 11, 1729–1733.PubMedCrossRefGoogle Scholar
  24. MacSweeney, M., Calvert, G. A., Campbell, R., McGuire, P. K., David, A. S., Williams, S. C. R., et al. (2002). Speechreading circuits in people born deaf. Neuropsychologia, 40, 801–807.PubMedCrossRefGoogle Scholar
  25. McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746–748.PubMedCrossRefGoogle Scholar
  26. Meltzoff, A. N., & Moore, M. K. (1997). Explaining facial imitation: A theoretical model. Early Development & Parenting, 6, 179–192.CrossRefGoogle Scholar
  27. Mills, A. E. (1987). The development of phonology in the blind child. In B. Dodd & R. Campbell (Eds.), Hearing by eye: The psychology of lip-reading (pp. 145–162). Hillsdale, NJ: Erlbaum.Google Scholar
  28. Nakamura, M., Iwano, K., & Furui, S. (2008). Differences between acoustic characteristics of spontaneous and read speech and their of effects on recognition performance. Computer Speech & Language, 22, 171–184.CrossRefGoogle Scholar
  29. Namy, L. L., Nygaard, L. C., & Sauerteig, D. (2002). Gender differences in vocal accommodation: The role of perception. Journal of Language & Social Psychology, 21, 422–432.CrossRefGoogle Scholar
  30. Natale, M. (1975). Convergence of mean vocal intensity in dyadic communication as a function of social desirability. Journal of Personality & Social Psychology, 32, 790–804.CrossRefGoogle Scholar
  31. Navarra, J., & Soto-Faraco, S. (2007). Hearing lips in a second language: Visual articulatory information enables the perception of L2 sounds. Psychological Research, 71, 4–12.PubMedCrossRefGoogle Scholar
  32. Nygaard, L. C. (2005). The integration of linguistic and non-linguistic properties of speech. In D. Pisoni & R. Remez (Eds.), Handbook of speech perception (pp. 390–414). Malden, MA: Blackwell.CrossRefGoogle Scholar
  33. Pardo, J. S. (2004). Acoustic-phonetic convergence among interacting talkers. Journal of the Acoustical Society of America, 115, 2608.Google Scholar
  34. Pardo, J. S. (2006). On phonetic convergence during conversational interaction. Journal of the Acoustical Society of America, 119, 2382–2393.PubMedCrossRefGoogle Scholar
  35. Pardo, J. S., & Remez, R. E. (2006). The perception of speech. In M. Traxler & M. A. Gernsbacher (Eds.), The handbook of psycholinguistics (2nd ed., pp. 201–248). New York: Academic Press.CrossRefGoogle Scholar
  36. Porter, R. J., Jr., & Castellanos, F. X. (1980). Speech production measures of speech perception: Rapid shadowing of VCV syllables. Journal of the Acoustical Society of America, 67, 1349–1356.PubMedCrossRefGoogle Scholar
  37. Porter, R. J., Jr., & Lubker, J. F. (1980). Rapid reproduction of vowel-vowel sequences: Evidence for a fast and direct acoustic-motoric linkage. Journal of Speech & Hearing Research, 23, 593–602.Google Scholar
  38. Reisberg, D., McLean, J., & Goldfield, A. (1987). Easy to hear but hard to understand: A lip-reading advantage with intact auditory stimuli. In B. Dodd & R. Campbell (Eds.), Hearing by eye: The psychology of lip-reading (pp. 97–114). Hillsdale, NJ: Erlbaum.Google Scholar
  39. Rosenblum, L. D. (2005). The primacy of multimodal speech perception. In D. Pisoni & R. Remez (Eds.), Handbook of speech perception (pp. 51–78). Malden, MA: Blackwell.CrossRefGoogle Scholar
  40. Rosenblum, L. D., Miller, R. M., & Sanchez, K. (2007). Lip-read me now, hear me better later: Cross-modal transfer of talker-familiarity effects. Psychological Science, 18, 392–396.PubMedCrossRefGoogle Scholar
  41. Rosenblum, L. D., Niehus, R. P., & Smith, N. M. (2007). Look who’s talking: Recognizing friends from visible articulation. Perception, 36, 157–159.PubMedCrossRefGoogle Scholar
  42. Rosenblum, L. D., Smith, N. M., Nichols, S. M., Hale, S., & Lee, J. (2006). Hearing a face: Cross-modal speaker matching using isolated visible speech. Perception & Psychophysics, 68, 84–93.CrossRefGoogle Scholar
  43. Rosenblum, L. D., Yakel, D. A., Baseer, N., Panchal, A., Nordarse, B. C., & Niehus, R. P. (2002). Visual speech information for face recognition. Perception & Psychophysics, 64, 220–229.CrossRefGoogle Scholar
  44. Sancier, M. L., & Fowler, C. A. (1997). Gestural drift in a bilingual speaker of Brazilian Portuguese and English. Journal of Phonetics, 25, 421–436.CrossRefGoogle Scholar
  45. Schweinberger, S. R., & Soukup, G. R. (1998). Asymmetric relationships among perceptions of facial identity, emotion, and facial speech. Journal of Experimental Psychology: Human Perception & Performance, 24, 1748–1765.CrossRefGoogle Scholar
  46. Sheffert, S. M., & Fowler, C. A. (1995). The effects of voice and visible speaker change on memory for spoken words. Journal of Memory & Language, 34, 665–685.CrossRefGoogle Scholar
  47. Sheffert, S. M., & Olson, E. (2004). Audiovisual speech facilitates voice learning. Perception & Psychophysics, 66, 352–362.CrossRefGoogle Scholar
  48. Shockley, K., Sabadini, L., & Fowler, C. A. (2004). Imitation in shadowing words. Perception & Psychophysics, 66, 422–429.CrossRefGoogle Scholar
  49. Shockley, K., Santana, M. V., & Fowler, C. A. (2003). Mutual interpersonal postural constraints are involved in cooperative conversation. Journal of Experimental Psychology: Human Perception & Performance, 29, 326–332.CrossRefGoogle Scholar
  50. Sumby, W. H., & Pollack, I. (1954). Visual contribution to speech intelligibility in noise. Journal of the Acoustical Society of America, 26, 212–215.CrossRefGoogle Scholar
  51. Sundara, M., Namasivayam, A. K., & Chen, R. (2001). Observation-execution matching system for speech: A magnetic stimulation study. NeuroReport, 12, 1341–1344.PubMedCrossRefGoogle Scholar
  52. Thalheimer, W., & Cook, S. (2002, August). How to calculate effect sizes from published research articles: A simplified methodology. Retrieved November 31, 2002, from effect_sizes.htm.Google Scholar
  53. Yakel, D. A., Rosenblum, L. D., & Fortier, M. A. (2000). Effects of talker variability on speechreading. Perception & Psychophysics, 62, 1405–1412.CrossRefGoogle Scholar

Copyright information

© Psychonomic Society, Inc. 2010

Authors and Affiliations

  • Rachel M. Miller
    • 1
  • Kauyumari Sanchez
    • 1
  • Lawrence D. Rosenblum
    • 1
    Email author
  1. 1.Department of PsychologyUniversity of CaliforniaRiverside

Personalised recommendations