On the Production and the Perception of Audio-Visual Speech by Man and Machine

Benoît, C.

doi:10.1007/978-1-4613-0403-6_34

C. Benoît²

98 Accesses
2 Citations

Abstract

Since the Fifties, several experiments have been run to evaluate the “benefit of lip-reading” on speech intelligibility, all presenting a natural face speaking at different levels of background noise: Sumby and Pollack, 1954; Neely, 1956; Erber, 1969; Binnie et al., 1974; Erber, 1975. We here present a similar experiment run with French stimuli.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adjoudani, A. and Benoît, C. to appear, On the integration of auditory and visual parameters in an HMM-based ASR, in: Speechreading by Man and Machine, D. Stork, Ed., NATO-ASI series, Springer-Verlag (1996).
Google Scholar
Benoît, C, Boë, L.J., and Abry, C, 1991, The effect of context on labiality in French, Proceedings of the 2nd Eurospeech Conference, Vol. 1, 153–156, Genoa, Italy.
Google Scholar
Benoît, C, Lallouache, M.T., Mohamadi, T.M., and Abry, C, 1992, A set of French visemes for visual speech synthesis, in: Talking Machines: Theories, Models, and Designs, G. Bailly & C. Benoît, Eds, Elsevier Science Publishers, North-Holland, Amsterdam, 485–503.
Google Scholar
Benoît, C., Mohamadi, T., and Kandel, S., 1994, Efefct of phonetic context on audio-visual intelligibility of French, Journal of Speech and Hearing Research, 37, 1195–1203.
Google Scholar
Binnie, C.A., Montgomery, A.A., and Jackson, P.L., 1974, Auditory and visual contributions to the perception of consonants, Journal of Speech and Hearing Research, 17, 619–630.
Google Scholar
Cohen, M.M. & Massaro D.W., 1993, Modeling coarticulation in synthetic visual speech, Computer Animation’93, N. Magnenat-Thalmann & D. Thalmann, Eds, Springer-Verlag.
Google Scholar
Erber, N.R, 1969, Interaction of audition and vision in the recognition of oral speech stimuli. Journal of Speech & Hearing Research, 12, 423–425.
Google Scholar
Erber, N.P., 1975, Auditory-visual perception of speech. Journal of Speech & Hearing Disorders, 40, 481–492.
Google Scholar
Guiard-Marigny, T. and Ostry D.J., 1995, Three-dimensional visualization of human jaw motion in speech, Meeting of the Acoustical Society of America, Washington.
Google Scholar
Guiard-Marigny, T. Benoît, C. and Ostry, D.J., 1995, Speech intelligibility of synthetic lips and jaw, Proc. of the 13th Int. Congress of Phonetic Sciences, Vol. 3, 222–226, Stockholm, Sweden.
Google Scholar
Le Goff B. Guiard-Marigny, T. and Benoît, C., 1994, Real-time analysis-synthesis and intelligibility of talking faces, Proc. of the 2nd International Workshop on Speech Synthesis, 53–56, New Paltz (NY), USA.
Google Scholar
Le Goff, B., Guiard-Marigny, T., and Benoît, C., 1995, Read my lips… and my jaw! How intelligible are the components of a speaker’s face?, Proceedings of the 4thEurospeech Conference, Vol. 1, 291–294, Madrid, Spain.
Google Scholar
McGrath M., 1985, An examination of cues for visual and auso-visual speech perception using natural and computer-generated faces, Ph.D Thesis, University of Nottingham, UK.
Google Scholar
Neely, K.K., 1956, Effect of visual factors on the intelligibility of speech, Journal of the Acoustical Society of America, 28, 1275–1277.
Article Google Scholar
Sumby, W.H., & Pollack, I., 1954, Visual contribution to speech intelligibility in noise. Journal of the Acoustical Society of America, 26, 212–215.
Article Google Scholar
Summerfield, Q., MacLeod, A., McGrath, M., & Brooke, M., 1989, Lips, teeth, and the benefit of lipreading, in Handbook of Research on Face Processing, A.W. Young & H.D. Ellis, Eds, Elsevier Science Publishers, North-Holland, Amsterdam, 223–233.
Google Scholar

Download references

Author information

Authors and Affiliations

Institut de la Communication Parlée Unité de Recherche Associée au CNRS N° 368, INPG/ENSERG – Université STENDHAL, BP 25X - F38040, Grenoble, France
C. Benoît

Authors

C. Benoît
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Polytechnic University, Brooklyn, New York, USA
Yao Wang , Shivendra Panwar , Seung-Pil Kim & Henry L. Bertoni , , &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Benoît, C. (1996). On the Production and the Perception of Audio-Visual Speech by Man and Machine. In: Wang, Y., Panwar, S., Kim, SP., Bertoni, H.L. (eds) Multimedia Communications and Video Coding. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-0403-6_34

Download citation

DOI: https://doi.org/10.1007/978-1-4613-0403-6_34
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-8036-8
Online ISBN: 978-1-4613-0403-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics