Skip to main content

On the Production and the Perception of Audio-Visual Speech by Man and Machine

  • Chapter
Multimedia Communications and Video Coding

Abstract

Since the Fifties, several experiments have been run to evaluate the “benefit of lip-reading” on speech intelligibility, all presenting a natural face speaking at different levels of background noise: Sumby and Pollack, 1954; Neely, 1956; Erber, 1969; Binnie et al., 1974; Erber, 1975. We here present a similar experiment run with French stimuli.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Adjoudani, A. and Benoît, C. to appear, On the integration of auditory and visual parameters in an HMM-based ASR, in: Speechreading by Man and Machine, D. Stork, Ed., NATO-ASI series, Springer-Verlag (1996).

    Google Scholar 

  • Benoît, C, Boë, L.J., and Abry, C, 1991, The effect of context on labiality in French, Proceedings of the 2nd Eurospeech Conference, Vol. 1, 153–156, Genoa, Italy.

    Google Scholar 

  • Benoît, C, Lallouache, M.T., Mohamadi, T.M., and Abry, C, 1992, A set of French visemes for visual speech synthesis, in: Talking Machines: Theories, Models, and Designs, G. Bailly & C. Benoît, Eds, Elsevier Science Publishers, North-Holland, Amsterdam, 485–503.

    Google Scholar 

  • Benoît, C., Mohamadi, T., and Kandel, S., 1994, Efefct of phonetic context on audio-visual intelligibility of French, Journal of Speech and Hearing Research, 37, 1195–1203.

    Google Scholar 

  • Binnie, C.A., Montgomery, A.A., and Jackson, P.L., 1974, Auditory and visual contributions to the perception of consonants, Journal of Speech and Hearing Research, 17, 619–630.

    Google Scholar 

  • Cohen, M.M. & Massaro D.W., 1993, Modeling coarticulation in synthetic visual speech, Computer Animation’93, N. Magnenat-Thalmann & D. Thalmann, Eds, Springer-Verlag.

    Google Scholar 

  • Erber, N.R, 1969, Interaction of audition and vision in the recognition of oral speech stimuli. Journal of Speech & Hearing Research, 12, 423–425.

    Google Scholar 

  • Erber, N.P., 1975, Auditory-visual perception of speech. Journal of Speech & Hearing Disorders, 40, 481–492.

    Google Scholar 

  • Guiard-Marigny, T. and Ostry D.J., 1995, Three-dimensional visualization of human jaw motion in speech, Meeting of the Acoustical Society of America, Washington.

    Google Scholar 

  • Guiard-Marigny, T. Benoît, C. and Ostry, D.J., 1995, Speech intelligibility of synthetic lips and jaw, Proc. of the 13th Int. Congress of Phonetic Sciences, Vol. 3, 222–226, Stockholm, Sweden.

    Google Scholar 

  • Le Goff B. Guiard-Marigny, T. and Benoît, C., 1994, Real-time analysis-synthesis and intelligibility of talking faces, Proc. of the 2nd International Workshop on Speech Synthesis, 53–56, New Paltz (NY), USA.

    Google Scholar 

  • Le Goff, B., Guiard-Marigny, T., and Benoît, C., 1995, Read my lips… and my jaw! How intelligible are the components of a speaker’s face?, Proceedings of the 4thEurospeech Conference, Vol. 1, 291–294, Madrid, Spain.

    Google Scholar 

  • McGrath M., 1985, An examination of cues for visual and auso-visual speech perception using natural and computer-generated faces, Ph.D Thesis, University of Nottingham, UK.

    Google Scholar 

  • Neely, K.K., 1956, Effect of visual factors on the intelligibility of speech, Journal of the Acoustical Society of America, 28, 1275–1277.

    Article  Google Scholar 

  • Sumby, W.H., & Pollack, I., 1954, Visual contribution to speech intelligibility in noise. Journal of the Acoustical Society of America, 26, 212–215.

    Article  Google Scholar 

  • Summerfield, Q., MacLeod, A., McGrath, M., & Brooke, M., 1989, Lips, teeth, and the benefit of lipreading, in Handbook of Research on Face Processing, A.W. Young & H.D. Ellis, Eds, Elsevier Science Publishers, North-Holland, Amsterdam, 223–233.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Plenum Press, New York

About this chapter

Cite this chapter

Benoît, C. (1996). On the Production and the Perception of Audio-Visual Speech by Man and Machine. In: Wang, Y., Panwar, S., Kim, SP., Bertoni, H.L. (eds) Multimedia Communications and Video Coding. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-0403-6_34

Download citation

  • DOI: https://doi.org/10.1007/978-1-4613-0403-6_34

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-8036-8

  • Online ISBN: 978-1-4613-0403-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics