Skip to main content

Multimodal Speech Perception: A Paradigm for Speech Science

  • Chapter
Multimodality in Language and Speech Systems

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 19))

Abstract

Speech science evolved as the study of a unimodal phenomenon. Speech was viewed as a solely auditory event, as captured by the seminal speech-chain illustration of Denes & Pinson (1963) shown in Figure 1.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Campbell, C.S. & D.W. Massaro. “Perception of visible speech: influence of spatial quantization”, Perception, 26, 627–644, 1997.

    Article  Google Scholar 

  • Cave, C., I. Guaitella, R. Bertrand, S. Santi, F. Harlay & R. Espesser. “About the relationship between eyebrow movements and FO variations”. Proceedings of the International Conference on Spoken Language Processing (pp. 2175–2178 ), Wilmington: University of Delaware, 1996.

    Google Scholar 

  • Cohen, M.M., R.L. Walker & D.W. Massaro. “Perception of synthetic visual speech”. In: D.G. Stork & M.E. Hennecke (Eds.), Speechreading by humans and machines (pp. 153–168 ). New York: Springer, 1996.

    Google Scholar 

  • Cole, R., T. Carmell, P. Connors, M. Macon, J. Wouters, J. deVilliers, A. Tarachow, D.W. Massaro, M.M. Cohen, J. Beskow, J. Yang, U. Meier, A. Waibel, P. Stone, G. Fortier, A. Davis, C. Soland. “Intelligent Animated Agents for Interactive Language Training”. Proceedings of Speech Technology in Language Learning. Stockholm, Sweden, 1998.

    Google Scholar 

  • Crowther, C.S., W.H. Batchelder & X. Hu. “A measurement-theoretical analysis of the Fuzzy Logical Model of Perception”. Psychological Review, 102, 396–408, 1995.

    Article  Google Scholar 

  • Cutting, J.E., N. Bruno, N.P. Brady & C. Moore. “Selectivity, scope, and simplicity of models: A lesson from fitting judgments of perceived depth”. Journal of Experimental Psychology: General, 121, 364–381, 1992.

    Article  Google Scholar 

  • Denes, P.B. “On the statistics of spoken English”. Journal of the Acoustical Society of America, 35, 892–904, 1963.

    Article  Google Scholar 

  • Diehl, R.L. & K.R. Kluender. “On the categorization of speech sounds”. In: S. Hamad (Ed.), Categorical perception (pp. 226–253 ). Cambridge: Cambridge University Press, 1987.

    Google Scholar 

  • Diehl, R.L. & K.R. Kluender. “On the objects of speech perception”. Ecological Psychology, 121–144, 1989.

    Google Scholar 

  • De Yoe, E.A. & D.C. Van Essen. “Concurrent processing streams in monkey visual cortex”. Trends in Neurosciences, 11, 219–226, 1988.

    Article  Google Scholar 

  • Ekman, P & W. Friesen. Pictures of facial affect. Palo Alto, CA: Consulting Psychologists Press, 1975.

    Google Scholar 

  • Ellison, J.W. & D.W. Massaro. “Featural evaluation, integration, and judgement of facial affect”, Journal of Experimental Psychology: Human Perception and Performance, 2, 213–226, 1997.

    Article  Google Scholar 

  • Fowler, C.A. “Listeners do hear sounds, not tongu”. Journal of the Acoustical Society of America, 99, 1730–1741, 1996.

    Article  Google Scholar 

  • Frost, R., B.H. Repp & L. Katz. “Can speech perception be influenced by simultaneous presentation of print?” Journal of Memory and Language, 27, 741–755, 1988.

    Article  Google Scholar 

  • Green, K.P. “The use of auditory and visual information during phonetic processing: Implications for theories of speech perception”. In: Campbell, R., B. Dodd & D. Burnham (Eds.), Hearing by Eye II (pp. 3–25 ). East Sussex, UK: Psychology Press Ltd, 1998.

    Google Scholar 

  • Grosjean, F. “Spoken word recognition processes and the gating paradigm”. Perception & Psychophysics, 28, 267–283, 1980.

    Article  Google Scholar 

  • Kass, R.E. & A.E. Raferty. “Bayes factors”. Journal of the American Statistical Association, 90, 773–795, 1995.

    Article  MATH  Google Scholar 

  • Liberman, A.M. & I.G. Mattingly. “The motor theory of speech perception revised”. Cognition, 21, 1–33, 1985.

    Article  Google Scholar 

  • Lisker, L. “Rabid vs rapid: A catalog of acoustic features that may cue the distinction”. Haskins Laboratories, Status Report on Speech Research, SR-54, 127–132, 1978.

    Google Scholar 

  • Massaro, D.W. Speech Perception by Ear and Eye: A Paradigm for Psychological Inquiry. Hillsdale, NJ: Lawrence Erlbaum Associates, 1987.

    Google Scholar 

  • Massaro, D.W. Multiple book review of Speech perception by ear and eye: a paradigm for psychological inquiry, by D.W. Massaro. Behavioral and Brain Sciences, 12, 741–794, 1989.

    Article  Google Scholar 

  • Massaro, D.W. “Integration of multiple sources of information in language processing”. In: T Inui & J.L. McClelland (Eds.), Attention and Performance XVI: Information integration in perception and communication (pp. 397–432 ). Cambridge, MA: MIT Press, 1996.

    Google Scholar 

  • Massaro, D.W. Perceiving Talking Faces: From Speech Perception to a Behavioral Principle. MIT Press: Cambridge, MA, 1998.

    Google Scholar 

  • Massaro, D.W. & M.M. Cohen. “Evaluation and integration of visual and auditory information in speech perception”. Journal of Experimental Psychology: Human Perception and Performance, 9, 753–771, 1983.

    Article  Google Scholar 

  • Massaro, D.W. & M.M. Cohen. “Perception of synthesized audible and visible speech”. Psychological Science, 1, 55–63, 1990.

    Article  Google Scholar 

  • Massaro, D.W. & M.M. Cohen. “Speech Perception in Perceivers with Hearing Loss: Synergy of Multiple Modalities”. Journal of Speech, Language, and Hearing Research, 42: 21–41, 1999.

    Google Scholar 

  • Massaro, D.W. & P.B. Egan. “Perceiving affect from the voice and the face”. Psychonomic Bulletin and Review, 3, 215–221, 1996.

    Article  Google Scholar 

  • Massaro, D.W. & D. Friedman. “Models of integration given multiple sources of information”, Psychological Review, 97 (2), 225–252, 1990.

    Article  Google Scholar 

  • Massaro, D.W. & D.G. Stork. “Speech recognition and sensory integration”. American Scientist, 86, 236244, 1998.

    Google Scholar 

  • Massaro, D.W., M.M. Cohen & P.M.T. Smeele. “Cross-linguistic Comparisons in the Integration of Visual and Auditory Speech,” Memory and Cognition, 23, (1) 113–131, 1995.

    Article  Google Scholar 

  • Massaro, D.W., M.M. Cohen & L.A. Thompson. “Visible language in speech perception: Lipreading and reading,” Visible Language, 22, 9–31, 1988.

    Google Scholar 

  • Massaro, D.W., M.M. Cohen, C.S. Campbell & T. Rodriguez. “Bayes factor of model selection validates FLMP”. Psychonomic Bulletin & Review, 8, 1–17, 2001.

    Article  Google Scholar 

  • Massaro, D.W., M. Tsuzaki, M.M. Cohen, A. Gesi & R. Heredia. “Bimodal Speech Perception: An Examination across Languages”, Journal of Phonetics, 21, 445–478, 1993.

    Google Scholar 

  • Mattingly. I.G. & M. Studdert-Kennedy, (Eds). Modularity and the motor theory of speech perception. Hillsdale, NJ: Lawrence Erlbaum, 1991.

    Google Scholar 

  • McGurk, H. & J. MacDonald. “Hearing lips and seeing voices”. Nature, 264, 746–748, 1976.

    Article  Google Scholar 

  • Munhall, K.G. & Y. Tohkura. “Audiovisual gating and the time course of speech perception”. Journal of the Acoustical Society of America, 104, 530–539, 1998.

    Article  Google Scholar 

  • Myung, I.J. & M.A. Pitt. “Applying Occam’s razor in modeling cognition: A Bayesian approach”. Psychonomic Bulletin & Review, 4, 79–95, 1997.

    Article  Google Scholar 

  • Oerlemans, M. & P. Blarney. “Touch and auditory-visual speech perception”. In: Campbell, R., B. Dodd, & D. Burnham (Eds), Hearing by Eye II (pp. 267–281 ). East Sussex, UK: Psychology Press Ltd, 1998.

    Google Scholar 

  • Palmer, S.E. Vision Science: Protons to Phenomenology. Cambridge, MA: MIT Press, 1999.

    Google Scholar 

  • Pitt, M.A. & J. M. McQueen. “Is Compensation for Coarticulation Mediated by the Lexicon?” Journal of Memory and Language, 39, 347–370, 1998.

    Article  Google Scholar 

  • Rosenblum, L.D. & H.M. Saldana. “An audio-visual test of kinematic primitives for visual speech perception”. Journal of Experimental Psychology: Human Perception and Performance, 22, 318–331, 1996.

    Article  Google Scholar 

  • Rosenblum, L.D. & H.M. Saldana, L.D. & H.M. Saldana. “Time-varying information for visual speech perception”. In: Campbell, R., B. Dodd, & D. Burnham (Eds), Hearing by Eye II (pp. 61–81 ). East Sussex, UK: Psychology Press Ltd, 1998.

    Google Scholar 

  • Schindler, R.A. & M.M. Merzenich. Cochlear Implants. New York: Raven, 1985.

    Google Scholar 

  • Schwartz, J., J. Robert-Ribes, & P. Escudier.“Ten years after Summerfield: A taxonomy of models for audio-visual fusion in speech perception”. In: Campbell, R., B. Dodd & D. Bumham (Eds), Hearing by Eye II (pp. 85–108 ). East Sussex, UK: Psychology Press Ltd, 1998.

    Google Scholar 

  • Sekiyama, K. “Face or voice? Determinant of compellingness to the McGurk effect”. Proceedings of A VSP’ 98. Terrigal — Sydney, Australia, 1998.

    Google Scholar 

  • Tyler, R.S., J.M. Opie, H. Fryauf-Bertschy & B.J. Gantz. “Future directions for cochlear implants”. Journal of Speech-Language Pathology and Audiology, 16, 151–164, 1992.

    Google Scholar 

  • Warren, R.M. “Perceptual restoration of missing speech sounds”. Science, 167, 392–393, 1970.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Massaro, D.W. (2002). Multimodal Speech Perception: A Paradigm for Speech Science. In: Granström, B., House, D., Karlsson, I. (eds) Multimodality in Language and Speech Systems. Text, Speech and Language Technology, vol 19. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2367-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-94-017-2367-1_4

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-6024-2

  • Online ISBN: 978-94-017-2367-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics