Alignment to visual speech information

Abstract

Speech alignment is the tendency for interlocutors to unconsciously imitate one another’s speaking style. Alignment also occurs when a talker is asked to shadow recorded words (e.g., Shockley, Sabadini, & Fowler, 2004). In two experiments, we examined whether alignment could be induced with visual (lipread) speech and with auditory speech. In Experiment 1, we asked subjects to lipread and shadow out loud a model silently uttering words. The results indicate that shadowed utterances sounded more similar to the model’s utterances than did subjects’ nonshadowed read utterances. This suggests that speech alignment can be based on visual speech. In Experiment 2, we tested whether raters could perceive alignment across modalities. Raters were asked to judge the relative similarity between a model’s visual (silent video) utterance and subjects’ audio utterances. The subjects’ shadowed utterances were again judged as more similar to the model’s than were read utterances, suggesting that raters are sensitive to cross-modal similarity between aligned words.

References

  1. Arnold, P., & Hill, F. (2001). Bisensory augmentation: A speechreading advantage when speech is clearly audible and intact. British Journal of Psychology, 92, 339–355.

    Article  Google Scholar 

  2. Calvert, G. A., Bullmore, E., Brammer, M. J., Campbell, R., Iversen, S. D., Woodruff, P., et al. (1997). Silent lipreading activates the auditory cortex. Science, 276, 593–596.

    PubMed  Article  Google Scholar 

  3. Chartrand, T. L., & Bargh, J. A. (1999). The chameleon effect: The perception-behavior link and social interaction. Journal of Personality & Social Psychology, 76, 893–910.

    Article  Google Scholar 

  4. Davis, C., & Kim, J. (2001). Repeating and remembering foreign language words: Implications for language teaching systems. Artificial Intelligence Review, 16, 37–47.

    Article  Google Scholar 

  5. Fadiga, L., Craighero, L., Buccino, G., & Rizzolatti, G. (2002). Speech listening specifically modulates the excitability of tongue muscles: A TMS study. European Journal of Neuroscience, 15, 399–402.

    PubMed  Article  Google Scholar 

  6. Fowler, C. A. (2004). Speech as a supermodal or amodal phenomenon. In G. A. Calvert, C. Spence, & B. E. Stein (Eds.), The handbook of multisensory processing (pp. 189–201). Cambridge, MA: MIT Press.

    Google Scholar 

  7. Fowler, C. A., Brown, J. M., Sabadini, L., & Weihing, J. (2003). Rapid access to speech gestures in perception: Evidence from choice and simple response time tasks. Journal of Memory & Language, 49, 396–413.

    Article  Google Scholar 

  8. Gentilucci, M., & Bernardis, P. (2007). Imitation during phoneme production. Neuropsychologia, 45, 608–615.

    PubMed  Article  Google Scholar 

  9. Giles, H., Coupland, N., & Coupland, J. (1991). Accommodation theory: Communication, context, and consequences. In H. Giles, N. Coupland, & J. Coupland (Eds.), Contexts of accommodation: Developments in applied sociolinguistics (pp. 1–68). Cambridge: Cambridge University Press.

    Google Scholar 

  10. Goldinger, S. D. (1998). Echoes of echoes? An episodic theory of lexical access. Psychological Review, 105, 251–279.

    PubMed  Article  Google Scholar 

  11. Goldinger, S. D., & Azuma, T. (2004). Episodic memory reflected in printed word naming. Psychonomic Bulletin, 11, 716–722.

    Article  Google Scholar 

  12. Grant, K. W., & Seitz, P. F. (2000). The use of visible speech cues for improving auditory detection of spoken sentences. Journal of the Acoustical Society of America, 108, 1197–1208.

    PubMed  Article  Google Scholar 

  13. Gregory, S. W. (1990). Analysis of fundamental frequency reveals covariation in interview partners’ speech. Journal of Nonverbal Behavior, 14, 237–251.

    Article  Google Scholar 

  14. Kamachi, M., Hill, H., Lander, K., & Vatikiotis-Bateson, E. (2003). “Putting the face to the voice”: Matching identity across modality. Current Biology, 13, 1709–1714.

    PubMed  Article  Google Scholar 

  15. Kaufmann, J. M., & Schweinberger, S. R. (2005). Speaker variations influence speechreading speed for dynamic faces. Perception, 34, 595–610.

    PubMed  Article  Google Scholar 

  16. Kerzel, D., & Bekkering, H. (2000). Motor activation from visible speech: Evidence from stimulus-response compatibility. Journal of Experimental Psychology: Human Perception & Performance, 26, 634–647.

    Article  Google Scholar 

  17. Kozhevnikov, V., & Chistovich, L. (1965). Speech: Articulation and perception (JPRS Publication 50, 543). Washington, DC: Joint Publications Research Service.

    Google Scholar 

  18. Kuĉera, H., & Francis, W. (1967). Computational analysis of presentday American English. Providence, RI: Brown University Press.

    Google Scholar 

  19. Lachs, L., & Pisoni, D. B. (2004a). Crossmodal source identification in speech perception. Ecological Psychology, 16, 159–187.

    Article  Google Scholar 

  20. Lachs, L., & Pisoni, D. B. (2004b). Cross-modal source information and spoken word recognition. Journal of Experimental Psychology: Human Perception & Performance, 30, 378–396.

    Article  Google Scholar 

  21. Lachs, L., & Pisoni, D. B. (2004c). Specification of cross-modal source information in isolated kinematic displays of speech. Journal of the Acoustical Society of America, 116, 507–518.

    PubMed  Article  Google Scholar 

  22. Lander, K., & Davies, R. (2008). Does face familiarity influence speechreadability? Quarterly Journal of Experimental Psychology, 61, 961–967.

    Article  Google Scholar 

  23. MacSweeney, M., Amaro, E., Calvert, G. A., Campbell, R., David, A. S., McGuire, P., et al. (2000). Silent speechreading in the absence of scanner noise: An event-related fMRI study. NeuroReport, 11, 1729–1733.

    PubMed  Article  Google Scholar 

  24. MacSweeney, M., Calvert, G. A., Campbell, R., McGuire, P. K., David, A. S., Williams, S. C. R., et al. (2002). Speechreading circuits in people born deaf. Neuropsychologia, 40, 801–807.

    PubMed  Article  Google Scholar 

  25. McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746–748.

    PubMed  Article  Google Scholar 

  26. Meltzoff, A. N., & Moore, M. K. (1997). Explaining facial imitation: A theoretical model. Early Development & Parenting, 6, 179–192.

    Article  Google Scholar 

  27. Mills, A. E. (1987). The development of phonology in the blind child. In B. Dodd & R. Campbell (Eds.), Hearing by eye: The psychology of lip-reading (pp. 145–162). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  28. Nakamura, M., Iwano, K., & Furui, S. (2008). Differences between acoustic characteristics of spontaneous and read speech and their of effects on recognition performance. Computer Speech & Language, 22, 171–184.

    Article  Google Scholar 

  29. Namy, L. L., Nygaard, L. C., & Sauerteig, D. (2002). Gender differences in vocal accommodation: The role of perception. Journal of Language & Social Psychology, 21, 422–432.

    Article  Google Scholar 

  30. Natale, M. (1975). Convergence of mean vocal intensity in dyadic communication as a function of social desirability. Journal of Personality & Social Psychology, 32, 790–804.

    Article  Google Scholar 

  31. Navarra, J., & Soto-Faraco, S. (2007). Hearing lips in a second language: Visual articulatory information enables the perception of L2 sounds. Psychological Research, 71, 4–12.

    PubMed  Article  Google Scholar 

  32. Nygaard, L. C. (2005). The integration of linguistic and non-linguistic properties of speech. In D. Pisoni & R. Remez (Eds.), Handbook of speech perception (pp. 390–414). Malden, MA: Blackwell.

    Google Scholar 

  33. Pardo, J. S. (2004). Acoustic-phonetic convergence among interacting talkers. Journal of the Acoustical Society of America, 115, 2608.

    Google Scholar 

  34. Pardo, J. S. (2006). On phonetic convergence during conversational interaction. Journal of the Acoustical Society of America, 119, 2382–2393.

    PubMed  Article  Google Scholar 

  35. Pardo, J. S., & Remez, R. E. (2006). The perception of speech. In M. Traxler & M. A. Gernsbacher (Eds.), The handbook of psycholinguistics (2nd ed., pp. 201–248). New York: Academic Press.

    Google Scholar 

  36. Porter, R. J., Jr., & Castellanos, F. X. (1980). Speech production measures of speech perception: Rapid shadowing of VCV syllables. Journal of the Acoustical Society of America, 67, 1349–1356.

    PubMed  Article  Google Scholar 

  37. Porter, R. J., Jr., & Lubker, J. F. (1980). Rapid reproduction of vowel-vowel sequences: Evidence for a fast and direct acoustic-motoric linkage. Journal of Speech & Hearing Research, 23, 593–602.

    Google Scholar 

  38. Reisberg, D., McLean, J., & Goldfield, A. (1987). Easy to hear but hard to understand: A lip-reading advantage with intact auditory stimuli. In B. Dodd & R. Campbell (Eds.), Hearing by eye: The psychology of lip-reading (pp. 97–114). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  39. Rosenblum, L. D. (2005). The primacy of multimodal speech perception. In D. Pisoni & R. Remez (Eds.), Handbook of speech perception (pp. 51–78). Malden, MA: Blackwell.

    Google Scholar 

  40. Rosenblum, L. D., Miller, R. M., & Sanchez, K. (2007). Lip-read me now, hear me better later: Cross-modal transfer of talker-familiarity effects. Psychological Science, 18, 392–396.

    PubMed  Article  Google Scholar 

  41. Rosenblum, L. D., Niehus, R. P., & Smith, N. M. (2007). Look who’s talking: Recognizing friends from visible articulation. Perception, 36, 157–159.

    PubMed  Article  Google Scholar 

  42. Rosenblum, L. D., Smith, N. M., Nichols, S. M., Hale, S., & Lee, J. (2006). Hearing a face: Cross-modal speaker matching using isolated visible speech. Perception & Psychophysics, 68, 84–93.

    Article  Google Scholar 

  43. Rosenblum, L. D., Yakel, D. A., Baseer, N., Panchal, A., Nordarse, B. C., & Niehus, R. P. (2002). Visual speech information for face recognition. Perception & Psychophysics, 64, 220–229.

    Article  Google Scholar 

  44. Sancier, M. L., & Fowler, C. A. (1997). Gestural drift in a bilingual speaker of Brazilian Portuguese and English. Journal of Phonetics, 25, 421–436.

    Article  Google Scholar 

  45. Schweinberger, S. R., & Soukup, G. R. (1998). Asymmetric relationships among perceptions of facial identity, emotion, and facial speech. Journal of Experimental Psychology: Human Perception & Performance, 24, 1748–1765.

    Article  Google Scholar 

  46. Sheffert, S. M., & Fowler, C. A. (1995). The effects of voice and visible speaker change on memory for spoken words. Journal of Memory & Language, 34, 665–685.

    Article  Google Scholar 

  47. Sheffert, S. M., & Olson, E. (2004). Audiovisual speech facilitates voice learning. Perception & Psychophysics, 66, 352–362.

    Article  Google Scholar 

  48. Shockley, K., Sabadini, L., & Fowler, C. A. (2004). Imitation in shadowing words. Perception & Psychophysics, 66, 422–429.

    Article  Google Scholar 

  49. Shockley, K., Santana, M. V., & Fowler, C. A. (2003). Mutual interpersonal postural constraints are involved in cooperative conversation. Journal of Experimental Psychology: Human Perception & Performance, 29, 326–332.

    Article  Google Scholar 

  50. Sumby, W. H., & Pollack, I. (1954). Visual contribution to speech intelligibility in noise. Journal of the Acoustical Society of America, 26, 212–215.

    Article  Google Scholar 

  51. Sundara, M., Namasivayam, A. K., & Chen, R. (2001). Observation-execution matching system for speech: A magnetic stimulation study. NeuroReport, 12, 1341–1344.

    PubMed  Article  Google Scholar 

  52. Thalheimer, W., & Cook, S. (2002, August). How to calculate effect sizes from published research articles: A simplified methodology. Retrieved November 31, 2002, from http://work-learning.com/ effect_sizes.htm.

  53. Yakel, D. A., Rosenblum, L. D., & Fortier, M. A. (2000). Effects of talker variability on speechreading. Perception & Psychophysics, 62, 1405–1412.

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Lawrence D. Rosenblum.

Additional information

This research was supported by NIDCD Grant 1R01DC008957-01.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Miller, R.M., Sanchez, K. & Rosenblum, L.D. Alignment to visual speech information. Attention, Perception, & Psychophysics 72, 1614–1625 (2010). https://doi.org/10.3758/APP.72.6.1614

Download citation

Keywords

  • Visual Speech
  • Auditory Speech
  • Audio Video
  • Visual Speech Information
  • Versus ISUAL