Abstract
In this study, the nature of speech perception of native Mandarin Chinese was compared with that of American English speakers, using synthetic visual and auditory continua (from /ba/ to /da/) in an expanded factorial design. In Experiment 1, speakers identified synthetic unimodal and bimodal speech syllables as either /ba/ or /da/. In Experiment 2, Mandarin speakers were given nine possible response alternatives. Syllable identification was influenced by both visual and auditory sources of information for both Mandarin and English speakers. Performance was better described by the fuzzy logical model of perception than by an auditory dominance model or a weighted-averaging model. Overall, the results are consistent with the idea that although there may be differences in information (which reflect differences in phonemic repertoires, phonetic realizations of the syllables, and the phonotactic constraints of languages), the underlying nature of audiovisual speech processing is similar across languages.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Anderson, N. H. (1981).Foundations of information integration theory. New York: Academic Press.
Best, C. (1993). Emergence of language-specific constraints in perception of non-native speech: A window on early phonological development. In B. de Boysson-Bardies & S. de Schonen (Eds.),Developmental neurocognition: Speech and face processing in the first year of life (pp. 289–304). Norwell, MA: Kluwer.
Burnham, D., Lau, S., Tam, H., & Schoknecht, C. (2001). Visual discrimination of Cantonese tone by tonal but non-Cantonese speakers, and by non-tonal language speakers. InProceedings of International Conference on Auditory-Visual Speech Processing (pp. 155-160), Sydney, Australia. Available from http://www.isca-speech.org. archive/avsp98.
Chandler, J. P. (1969). Subroutine STEPIT: Finds local minima of a smooth function of several parameters.Behavioral Science,14, 81–82.
Cohen, M. M., &Massaro, D. W. (1990). Synthesis of visible speech.Behavioral Research Methods, Instruments, & Computers,22, 260–263.
Cutler, A., Demuth, K., &McQueen, J. M. (2002). Universality versus language-specificity in listening to running speech.Psychological Science,13, 258–262.
De Gelder, B., &Vroomen, J. (1992). Auditory and visual speech perception in alphabetic and non-alphabetic Chinese-Dutch bilinguals. In R. J. Harris (Ed.),Cognitive processing in bilinguals (pp. 413–426). Amsterdam: Elsevier.
Diehl, R. L., &Kluender, K. R. (1987). On the categorization of speech sounds. In S. Harnad (Ed.),Categorical perception (pp. 226–253). Cambridge: Cambridge University Press.
Donovan, J. (2001).Feminist theory: The intellectual traditions (3rd ed.). New York: Continuum.
Flege, J. E. (2003). Assessing constraints on second-language segmental production and perception. In A. Meyer & N. Schiller (Eds.),Phonetics and phonology in language: Comprehension and Production: Differences and Similarities. Berlin: Mouton de Gruyter.
Fowler, C. A. (1996). Listeners do hear sounds, not tongues.Journal of the Acoustical Society of America,99, 1730–1741.
Gallistel, C. R. (2002). Language and spatial frames of reference in mind and brain.Trends in Cognitive Sciences,6, 321–322.
Gouraud, H. (1971). Continuous shading of curved surfaces.IEEE Transactions on Computers,C-20, 623–628.
Hayashi, Y., & Sekiyama, K. (1998). Native-foreign language effect in the McGurk effect: A test with Chinese and Japanese. InProceedings of Auditory-Visual Speech Processing 1998 (pp. 61-66), Sydney, Australia. Available from http://www.isca-speech.org/archive/ avsp98.
Jacobs, A. M., &Grainger, J. (1994). Models of visual word recognition: Sampling the state of the art.Journal of Experimental Psychology: Human Perception & Performance,20, 1311–1334.
Klatt, D. H. (1980). Software for a cascade/parallel formant synthesizer.Journal of the Acoustical Society of America,67, 971–995.
Liberman, A. M. (1996).Speech: A special code. Cambridge, MA: MIT Press.
Liberman, A. M., &Mattingly, I. G. (1985). The motor theory of speech perception revised.Cognition,21, 1–36.
Massaro, D. W. (1987).Speech perception by ear and eye: A paradigm for psychological inquiry. Hillsdale, NJ: Erlbaum.
Massaro, D. W. (1988). Some criticisms of connectionist models of human performance.Journal of Memory & Language,27, 213–234.
Massaro, D. W. (1989). Testing between the TRACE model and the fuzzy logical model of speech perception.Cognitive Psychology,21, 398–421.
Massaro, D. W. (1998).Perceiving talking faces: From speech perception to a behavioral principle. Cambridge, MA: MIT Press.
Massaro, D. W., Cohen, M. M., &Smeele, P. M. T. (1995). Crosslinguistic comparisons in the integration of visual and auditory speech.Memory & Cognition,23, 113–131.
Massaro, D. W., &Friedman, D. (1990). Models of integration given multiple sources of information.Psychological Review,97, 225–252.
Massaro, D. W., Tsuzaki, M., Cohen, M. M., Gesi, A., &Heredia, R. (1993). Bimodal speech perception: An examination across languages.Journal of Phonetics,21, 445–478.
Massaro, D. W., Weldon, M. S., &Kitzis, S. N. (1991). Integration of orthographic and semantic information in memory retrieval.Journal of Experimental Psychology: Learning, Memory, & Cognition,17, 277–287.
Mattingly.I. G., &Studdert-Kennedy, M. (Eds.) (1991).Modularity and the motor theory of speech perception. Hillsdale, NJ: Erlbaum.
Movellan, J. R, &McClelland, J. L. (2001). The Morton-Massaro law of information integration: Implications for models of perception.Psychological Review,108, 113–148.
Nearey, T. M. (1992). Context effects in a double-weak theory of speech perception.Language & Speech,35, 153–171.
Oden, G. C., &Massaro, D. W. (1978). Integration of featural information in speech perception.Psychological Review,85, 172–191.
Platt, J. R. (1964). Strong inference.Science,146, 347–353.
Popper, K. R. (1959).The logic of scientific discovery. New York: Basic Books.
Robert-Ribes, J., Schwartz, J.-L., &Escudier, P. (1995). A comparison of models for fusion of the auditory and visual sensors in speech perception.Artificial Intelligence Review,9, 323–346.
Sekiyama, K. (1997). Cultural and linguistic factors in audiovisual speech processing: The McGurk effect in Chinese subjects.Perception & Psychophysics,59, 73–80.
Sekiyama, K., &Tohkura, Y. (1989). Effects of lip-read information on auditory perception of Japanese syllables [Abstract].Journal of the Acoustical Society of America,85(1, Suppl.), 138.
Sekiyama, K., &Tohkura, Y. (1991). McGurk effect in non-English listeners: Few visual effects for Japanese subjects hearing Japanese syllables of high auditory intelligibility.Journal of the Acoustical Society of America,90, 1797–1805.
Sekiyama, K., &Tohkura, Y. (1993). Inter-language differences in the influence of visual cues in speech perception.Journal of Phonetics,21, 427–444.
Tiippana, K., Sams, M., & Andersen, T. S. (2001). Visual attention influences audiovisual speech perception. In D. W. Massaro, J. Light, & K. Geraci (Eds.),Proceedings of Auditory-Visual Speech Processing (pp. 167-171). Aalborg. Available from http://www.isca_speech.org/ archive/avsp01.
Van Ijzendoorn, M. H., &Sagi, A. (1999). Cross-cultural patterns of attachment: Universal and contextual dimensions. In J. Cassidy & P. R. Shaver (Eds.),Handbook of attachment: Theory, research, and clinical applications (pp. 713–734). New York: Guilford.
Zadeh, L. A. (1965). Fuzzy sets.Information & Control,8, 338–353.
Author information
Authors and Affiliations
Corresponding author
Additional information
The research and writing of this article were supported by Grants CDA-9726363, BCS-9905176, and IIS-0086107 from the National Science Foundation, Public Health Service Grant PHS R01 DC00236, a Cure Autism Now Foundation Innovative Technology Award, and the University of California, Santa Cruz (Cota-Robles Fellowship).
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Chen, T.H., Massaro, D.W. Mandarin speech perception by ear and eye follows a universal principle. Perception & Psychophysics 66, 820–836 (2004). https://doi.org/10.3758/BF03194976
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.3758/BF03194976