Skip to main content
Log in

Visible speech improves human language understanding: Implications for speech processing systems

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Evidence from the study of human language understanding is presented suggesting that our ability to perceive visible speech can greatly influence our ability to understand and remember spoken language. A view of the speaker's face can greatly aid in the perception of ambiguous or noisy speech and can aid cognitive processing of speech leading to better understanding and recall. Some of these effects have been replicated using computer synthesized visual and auditory speech. Thus, it appears that when giving an interface a voice, it may be best to give it a face too.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Baddeley, A. D. (1986).The Psychology of Memory Basic Books: New York.

    Google Scholar 

  • Breeuwer, M. & Plomp, R. (1984). Speechreading Supplemented with Frequency-Selective Sound-Pressure Information.Journal of the Acoustical Society of America 76: 686–691.

    Google Scholar 

  • Brunswik, E. (1955). Representative Design and Probabilistic Theory in a Functional Psychology.Psychological Review 62: 193–217.

    Google Scholar 

  • Chi, M. T. H., Feltovich, P. J. & Glaser, R. (1981). Categorization and Representation of Physics Problems by Experts and Novices.Cognitive Science 5: 121–152.

    Google Scholar 

  • Cohen, M. M. & Massaro, D. W. (1990). Synthesis of Visible Speech.Behavior Research Methods, Instruments, & Computers 22: 260–263.

    Google Scholar 

  • Cohen, M. M. & Massaro, D. W. (1993). Modeling Coarticulation in Synthetic Visual Speech. In Thalmann, N. M. and Thalmann, D. (eds.)Models and Techniques in Computer Animation, 139–155. Springer-Verlag: New York.

    Google Scholar 

  • Gesi, A. T., Massaro, D. W. & Cohen, M. M. (1992). Discovery and Expository Methods in Teaching Visual Consonant and Word Identification.Journal of Speech and Hearing Research 35: 1180–1188.

    Google Scholar 

  • Guindon, R. (1988). How to Interface to Advisory Systems? Users Request Help With a Very Simple Language. In Proceedingsof CHI '88, 191–196 Association for Computing Machinery: New York.

    Google Scholar 

  • Hotchkiss, D. (1987).Demographic Aspects of Hearing Impairment: Questions and Answers. Center for Assessment and Demographic Studies, Gallaudet Research Institute: Washington, DC.

    Google Scholar 

  • Just, M. A. & Carpenter, P. A. (1980). A Theory of Reading: From Eye Fixations to Comprehension.Psychological Review 87: 329–354.

    Google Scholar 

  • Kendon, A. (1983). Gesture and Speech: How They Interact. In Weimann, J. M. & Harrison, R. P. (eds.)Nonverbal Interaction, 13–45. Sage: Beverly Hills, CA.

    Google Scholar 

  • Krauss, R., Morrel-Samuels, Pl. & Colasante, C. (1991). Do Conversational Hand Gestures Communicate?Journal of Personality and Social Psychology 61: 743–754.

    Google Scholar 

  • Kuhl, P. K. & Meltzoff, A. N. (1988). Speech as an Intermodal Object of Perception. In Yonas, Albert (eds.)Perceptual Development in Infancy: The Minnesota Symposia on Child Psychology, Vol. 20, 235–266. Lawrence Erlbaum Associates: Hillsdale, NJ.

    Google Scholar 

  • Leiser, R. G. (1989). Exploiting Convergence to Improve Natural Langauge Understanding.Interacting with Computers: The Interdisciplinary Journal of Human-Computer Interaction 1: 284–298.

    Google Scholar 

  • Lesgold, A., Rubinson, H., Feltovich, P., Glaser, R., Klopfer, D. & Wang, Y. (1988). Expertise in a Complex Skill: Diagnosing X-ray Pictures. In Chi, M. T. H., Glaser, R. & Farr, M. J. (eds.)The Nature of Expertise. Lawrence Erlbaum Associates: Hillsdale, NJ.

    Google Scholar 

  • MacLeod, A. & Summerfield, Q. (1990). A Procedure for Measuring Auditory and Audio-visual Speech-Reception Thresholds for Sentences in Noise: Rationale, Evaluation, and Recommendations for Use.British Journal of Audiology 24: 29–43.

    Google Scholar 

  • Marslen-Wilson, W. D. & Tyler, L. K. (1980). The Temporal Structure of Spoken Language Understanding.Cognition 8: 1–71.

    PubMed  Google Scholar 

  • Massaro, D. W. (1987).Speech Perception by Ear and Eye: A Paradigm for Psychological Inquiry. Erlbaum: Hillsdale, NJ.

    Google Scholar 

  • Massaro, D. W. (in press). Bimodal Speech Perception Across the Lifespan. In Lewkowicz, D.J. & Lickliter, R. (eds.)The Development of Intersensory Perception: Comparative Perspectives. Lawrence Erlbaum Associates: Hillsdale, NJ.

  • Massaro, D. W. & Cohen, M. M. (1990). Perception of Synthesized Audible and Visible Speech.Psychological Science 1: 55–63.

    Google Scholar 

  • Massaro, D. W., Cohen, J. M. & Gesi, A. T. (1993). Long-Term Training, Transfer, and Retention in Learning to Lipread.Perception & Psychophysics 53: 549–562.

    Google Scholar 

  • Massaro, D. W., Thompson, L. A., Barron, B. & Laren, E. (1986). Developmental Changes in Visual and Auditory Contributions to Speech Perception.Journal of Experimental Child Psychology 41: 93–113.

    Google Scholar 

  • McGurk, H. & MacDonald, J. (1976). Hearing Lips and Seeing Voices.Nature 264: 746–748.

    Google Scholar 

  • McNeill, D. (1987). So YouDo Think Gestures Are Nonverbal? Reply to Feyereisen (1987).Psychological Review 94: 499–504.

    Google Scholar 

  • Ogden, W. C. (1988). Using Natural Language Interfaces. In Helander, M. (ed.)Handbook of Human-Computer Interaction Elsevier Science Publishers: North-Holland.

    Google Scholar 

  • Ogden, W. C. & Brooks, S. R. (1983). Query Languages for the Casual User: Exploring the Middle Ground Between Formal and Natural Languages. In Proceedings ofCHI '83: Human Factors in Computing Systems, 161–165. Association for Computing Machinery: New York.

    Google Scholar 

  • Pearce, A., Wyvill, B., Wyvill, G. & Hill, D. (1986). Speech and Expression: A Computer Solution to Face Animation.Graphics Interface '86.

  • Petajan, E. D. (1985). automatic Lipreading to Enhance Speech Recognition.IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 19–23, 40–47.

  • Schoenfeld, A. H. & Hermann, D. J. (1982). Problem Perception and Knowledge Structure in Expert and Novice Mathematical Problem Solvers.Journal of Experimental Psychology: Learning, Memory and Cognition 8: 484–494.

    Google Scholar 

  • Short, J., Williams, E. & Christie, B. (1976).The Social Psychology of Telecommunications. Wiley: Chichester, England.

    Google Scholar 

  • Silver, E. A. (1979). Students Perceptions of Relatedness Among Mathematical Verbal Problems.Journal for Research in Mathematics Education 12: 54–64.

    Google Scholar 

  • Strassmann, P. (1990).The Business Value of Computers. Information Economics: New Caanan, CT.

    Google Scholar 

  • Sumby, W. H. & Pollack, I. (1954). Visual Contribution to Speech Intelligibility in Noise.Journal of the Acoustical Society of America 26: 212–215.

    Google Scholar 

  • Summerfield, A. Q. (1979). Use of Visual Information in Phonetic Perception.Phonetica 36: 314–331.

    Google Scholar 

  • Thompson, L. A. (in press). Encoding and Memory for Visible Speech and Gestures: A Comparison Between Young and Older Adults.Psychology and Aging.

  • Thompson, L.A. & Lee, K. (in press). Information Integration in Cross-Model Pattern Recognition: An Argument for Acquired Modularity.Acta Psychologica.

  • Thompson, L. A. & Massaro, D. W. (1986). Evaluation and Integration of Speech and pointing Gestures During Referential Understanding.Journal of Experimental Child Psychology 42: 144–168.

    Google Scholar 

  • Thompson, L. A. & Massaro, D. W. (1994). Children's Integration of Speech and Pointing Gestures in Comprehension.Journal of Experimental Child Psychology 57: 327–354.

    Google Scholar 

  • Walden, B. E., Prosek, R. A., Montgomery, A., Scherr, C. K. & Jones, C. J. (1977). Effects of Training on the Visual Recognition of Consonants.Journal of Speech and Hearing Research 20: 130–145.

    Google Scholar 

  • Walden, B. E., Prosek, R. A. & Worthington, D. W. (1974). Predicting Audiovisual Consonant Recognition Performance of Hearing-Impaired Adults.Journal of Speech and Hearing Research 18: 272–280.

    Google Scholar 

  • Watt, W. C. (1968). Habitability.American Documentation. July, 338–351.

  • Weiser, M. & Shertz, J. (1983). Programming Problem Representation in Novice and Expert Programmers.International Journal of Man-Machine Studies 19: 391–398.

    Google Scholar 

  • Williams, E. (1977). Experimental Comparisons of Face-to-Face and Mediated Communication: A Review.Psychological Bulletin 84: 963–976.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thompson, L.A., Ogden, W.C. Visible speech improves human language understanding: Implications for speech processing systems. Artif Intell Rev 9, 347–358 (1995). https://doi.org/10.1007/BF00849044

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00849044

Key words

Navigation