Skip to main content

Discriminating languages by speech-reading

Abstract

The goal of this study was to explore the ability to discriminate languages using the visual correlates of speech (i.e., speech-reading). Participants were presented with silent video clips of an actor pronouncing two sentences (in Catalan and/or Spanish) and were asked to judge whether the sentences were in the same language or in different languages. Our results established that Spanish—Catalan bilingual speakers could discriminate running speech from their two languages on the basis of visual cues alone (Experiment 1). However, we found that this ability was critically restricted by linguistic experience, since Italian and English speakers who were unfamiliar with the test languages could not successfully discriminate the stimuli (Experiment 2). A test of Spanish monolingual speakers revealed that knowledge of only one of the two test languages was sufficient to achieve the discrimination, although at a lower level of accuracy than that seen in bilingual speakers (Experiment 3). Finally, we evaluated the ability to identify the language by speech-reading particularly distinctive words (Experiment 4). The results obtained are in accord with recent proposals arguing that the visual speech signal is rich in informational content, above and beyond what traditional accounts based solely on visemic confusion matrices would predict.

References

  • Auer, E. T., Jr. (2002). The influence of the lexicon on speech read word recognition: Contrasting segmental and lexical distinctiveness.Psychonomic Bulletin & Review,9, 341–347.

    Article  Google Scholar 

  • Auer, E. T., Jr., &Bernstein, L. E. (1997). Speechreading and the structure of the lexicon: Computationally modeling the effects of reduced phonetic distinctiveness on lexical uniqueness.Journal of the Acoustical Society of America,102, 3704–3710.

    PubMed  Article  Google Scholar 

  • Berger, K. W. (1972). Visemes and homophenous words.Teacher of the Deaf,70, 396–399.

    Google Scholar 

  • Bernstein, L. E., Auer, E. T., Jr., &Tucker, P. E. (2001). Enhanced speechreading in deaf adults: Can short-term training/practice close the gap for hearing adults?Journal of Speech, Language, & Hearing Research,44, 5–18.

    Article  Google Scholar 

  • Bernstein, L. [E.], &Benoît, C. (1996). For speech perception three senses are better than one. InProceedings of the 4th International Conference on Spoken Language Processing (ICSLP 96) (Vol. 3, pp. 1477–1480). New York: IEEE Press.

    Chapter  Google Scholar 

  • Bernstein, L. E., Demorest, M. E., &Tucker, P. E. (1998). What makes a good speechreader? First you have to find one. In R. Campbell, B. Dodd, & D. Burnham (Eds.),Hearing by eye II: Advances in the psychology of speechreading and auditory—visual speech (pp. 211–227). Hove, U.K.: Psychology Press.

    Google Scholar 

  • Bernstein, L. E., Demorest, M. E., &Tucker, P. E. (2000). Speech perception without hearing.Perception & Psychophysics,62, 233–252.

    Article  Google Scholar 

  • Bernstein, L. E., Iverson, P., & Auer, E. T., Jr. (1997). Elucidating the complex relationships between phonetic perception and word recognition in audiovisual speech perception. In C. Benoît & R. Campbell (Eds.),Proceedings of the ESCA/ESCOP Workshop on Audio—Visual Speech Processing (pp. 89–92).

  • Bosch, L., &Sebastián-Gallés, N. (1997). Native-language recognition abilities in 4-month-old infants from monolingual and bilingual environments.Cognition,65, 33–69.

    PubMed  Article  Google Scholar 

  • Bosch, L., &Sebastián-Gallés, N. (2001). Evidence of early language discrimination abilities in infants from bilingual environments.Infancy,2, 29–49.

    Article  Google Scholar 

  • Burnham, D., &Dodd, B. (2004). Auditory—visual speech integration by prelinguistic infants: Perception of an emergent consonant in the McGurk effect.Developmental Psychobiology,45, 204–220.

    PubMed  Article  Google Scholar 

  • Callan, D. E., Jones, J. A., Munhall, K., Callan, A. M., Kroos, C., &Vatikiotis-Bateson, E. (2003). Neural processes underlying perceptual enhancement by visual speech gestures.NeuroReport,14, 2213–2218.

    PubMed  Article  Google Scholar 

  • Calvert, G. A., Brammer, M. J., Bullmore, E. T., Campbell, R., Iversen, S. D., &David, A. S. (1999). Response amplification in sensory-specific cortices during crossmodal binding.NeuroReport,10, 2619–2623.

    PubMed  Article  Google Scholar 

  • Calvert, G. A., Bullmore, E. T., Brammer, M. J., Campbell, R., Williams, S. C., McGuire, P. K., et al. (1997). Activation of auditory cortex during silent lipreading.Science,276, 593–596.

    PubMed  Article  Google Scholar 

  • Calvert, G. A., Spence, C., &Stein, B. E. (Eds.) (2004).The handbook of multisensory processes. Cambridge, MA: MIT Press.

    Google Scholar 

  • Campbell, R., Dodd, B., &Burnham, D. (Eds.) (1998).Hearing by eye II: Advances in the psychology of speechreading and auditory— visual speech. Hove, U.K.: Psychology Press.

    Google Scholar 

  • Conrad, R. (1977). Lipreading by deaf and hearing children.British Journal of Educational Psychology,47, 60–65.

    PubMed  Google Scholar 

  • Desjardins, R. N., Rogers, J., &Werker, J. F. (1997). An exploration of why preschoolers perform differently than do adults in audiovisual speech perception tasks.Journal of Experimental Child Psychology,66, 85–110.

    PubMed  Article  Google Scholar 

  • Desjardins, R. N., &Werker, J. F. (2004). Is the integration of heard and seen speech mandatory for infants?Developmental Psychobiology,45, 187–203.

    PubMed  Article  Google Scholar 

  • Dodd, B. (1979). Lip reading in infants: Attention to speech presented in- and out-of-synchrony.Cognitive Psychology,11, 478–484.

    PubMed  Article  Google Scholar 

  • Dodd, B., &Campbell, R. (Eds.) (1987).Hearing by eye: The psychology of lip-reading. London: Erlbaum.

    Google Scholar 

  • Dodd, B., &Murphy, J. (1992). Visual thoughts. In R. Campbell (Ed.),Mental lives: Case studies in cognition (pp. 47–60). Oxford: Blackwell.

    Google Scholar 

  • Dupoux, E., Pallier, C., Sebastián, N., &Mehler, J. (1997). A destressing “deafness” in French?Journal of Memory & Language,36, 406–421.

    Article  Google Scholar 

  • Dupoux, E., Peperkamp, S., &Sebastián-Gallés, N. (2001). A robust method to study stress “deafness.”Journal of the Acoustical Society of America,110, 1606–1618.

    PubMed  Article  Google Scholar 

  • Fisher, C. G. (1968). Confusions among visually perceived consonants.Journal of Speech & Hearing Research,11, 769–804.

    Google Scholar 

  • Fowler, C. A. (1996). Listeners do hear sounds, not tongues.Journal of the Acoustical Society of America,99, 1730–1741.

    PubMed  Article  Google Scholar 

  • Hadar, U., Steiner, T. J., Grant, E. C., &Rose, F. C. (1983). Head movement correlates of juncture and stress at sentence level.Language & Speech,26, 117–129.

    Google Scholar 

  • Hadar, U., Steiner, T. J., Grant, E. C., &Rose, F. C. (1984). The timing of shifts in head posture during conversation.Human Movement Science,3, 237–245.

    Article  Google Scholar 

  • Heider, F., &Heider, G. M. (1940). An experimental investigation of lipreading.Psychological Monographs,52, 124–153.

    Google Scholar 

  • Kamachi, M., Hill, H., Lander, K., &Vatikiotis-Bateson, E. (2003). “Putting the face to the voice”: Matching identity across modality.Current Biology,13, 1709–1714.

    PubMed  Article  Google Scholar 

  • Kuhl, P. K., &Meltzoff, A. N. (1982). The bimodal perception of speech in infancy.Science,218, 1138–1141.

    PubMed  Article  Google Scholar 

  • Kuhl, P. K., &Meltzoff, A. N. (1984). The intermodal representation of speech in infants.Infant Behavior & Development,7, 361–381.

    Article  Google Scholar 

  • Liberman, A. M., &Mattingly, I. G. (1985). The motor theory of speech perception revised.Cognition,21, 1–36.

    PubMed  Article  Google Scholar 

  • Lyxell, B., &Rönnberg, J. (1991). Word discrimination and chronological age related to sentence-based speech-reading skill.British Journal of Audiology,25, 3–10.

    PubMed  Article  Google Scholar 

  • MacEachern, M. R. (2000). On the visual distinctiveness of words in the English lexicon.Journal of Phonetics,28, 367–376.

    Article  Google Scholar 

  • Macmillan, N. A., &Creelman, C. D. (1991).Detection theory: A user’s guide. Cambridge: Cambridge University Press.

    Google Scholar 

  • Massaro, D. W. (1998).Perceiving talking faces: From speech perception to a behavioral principle. Cambridge, MA: MIT Press.

    Google Scholar 

  • Massaro, D. W., Cohen, M. M., &Gesi, A. T. (1993). Long-term training, transfer, and retention in learning to lipread.Perception & Psychophysics,53, 549–562.

    Article  Google Scholar 

  • Mattys, S. L.,Bernstein, L. E., &Auer, E. T., Jr. (2002). Stimulusbased lexical distinctiveness as a general word-recognition mechanism.Perception & Psychophysics,64, 667–679.

    Article  Google Scholar 

  • McGurk, H., &MacDonald, J. (1976). Hearing lips and seeing voices.Nature,264, 746–748.

    PubMed  Article  Google Scholar 

  • Mehler, J., Jusczyk, P., Lambertz, G., Halsted, N., Bertoncini, J., &Amiel-Tison, C. (1988). A precursor of language acquisition in young infants.Cognition,29, 143–178.

    PubMed  Article  Google Scholar 

  • Mogford, K. (1987). Lip-reading in the prelingually deaf. In B. Dodd & R. Campbell (Eds.),Hearing by eye: The psychology of lip-reading (pp. 191–211). London: Erlbaum.

    Google Scholar 

  • Munhall, K., Jones, J. A., Callan, D. E., Kuratate, T., &Vatikiotis-Bateson, E. (2004). Head movement improves auditory speech perception.Psychological Science,15, 133–137.

    PubMed  Article  Google Scholar 

  • Navarra, J., Sebastián-Gallés, N., &Soto-Faraco, S. (2005). The perception of second language sounds in early bilinguals: New evidence from an implicit measure.Journal of Experimental Psychology: Human Perception & Performance,31, 912.

    Article  Google Scholar 

  • Navarra, J., &Soto-Faraco, S. (2007). Hearing lips in a second language: Visual articulatory information enables the perception of second language sounds.Psychological Research,71, 4–12.

    PubMed  Article  Google Scholar 

  • Navarra, J., Spence, C., & Soto-Faraco, S. (2007).Visual discrimination of rhythm in speech. Manuscript submitted for publication.

  • Nazzi, T., Bertoncini, J., &Mehler, J. (1998). Language discrimination by newborns: Toward an understanding of the role of rhythm.Journal of Experimental Psychology: Human Perception & Performance,24, 756–766.

    Article  Google Scholar 

  • Nazzi, T., Jusczyk, P. W., &Johnson, E. K. (2000). Language discrimination by English-learning 5-month-olds: Effects of rhythm and familiarity.Journal of Memory & Language,43, 1–19.

    Article  Google Scholar 

  • Nitchie, E. B. (1916). The use of homophenous words.Volta Review,18, 85–93.

    Google Scholar 

  • Owens, E., &Blazek, B. (1985). Visemes observed by hearingimpaired and normal-hearing adult viewers.Journal of Speech & Hearing Research,28, 381–393.

    Google Scholar 

  • Patterson, M. L., &Werker, J. F. (1999). Matching phonetic information in lips and voice is robust in 4.5-month-old infants.Infant Behavior & Development,22, 237–247.

    Article  Google Scholar 

  • Patterson, M. L., &Werker, J. F. (2002). Infants’ ability to match dynamic phonetic and gender information in the face and voice.Journal of Experimental Child Psychology,81, 93–115.

    PubMed  Article  Google Scholar 

  • Ramus, F. (2002). Language discrimination by newborns.Annual Review of Language Acquisition,2, 85–115.

    Article  Google Scholar 

  • Ramus, F., Hauser, M. D., Miller, C., Morris, D., &Mehler, J. (2000). Language discrimination by human newborns and by cottontop tamarin monkeys.Science,288, 349–351.

    PubMed  Article  Google Scholar 

  • Ramus, F., &Mehler, J. (1999). Language identification with suprasegmental cues: A study based on speech resynthesis.Journal of the Acoustical Society of America,105, 512–521.

    PubMed  Article  Google Scholar 

  • Ramus, F., Nespor, M., &Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal.Cognition,73, 265–292.

    PubMed  Article  Google Scholar 

  • Reisberg, D., McLean, J., &Goldfield, A. (1987). Easy to hear but hard to understand: A lip-reading advantage with intact auditory stimuli. In B. Dodd & R. Campbell (Eds.),Hearing by eye: The psychology of lip-reading (pp. 97–113). London: Erlbaum.

    Google Scholar 

  • Rönnberg, J., Samuelsson, S., &Lyxell, B. (1998). Conceptual constraints in sentence-based lipreading in the hearing-impaired. In R. Campbell, B. Dodd, & D. Burnham (Eds.),Hearing by eye II: Advances in the psychology of speechreading and auditory—visual speech (pp. 143–153). Hove, U.K.: Psychology Press.

    Google Scholar 

  • Sacks, O. (1990).Seeing voices: A journey into the world of the deaf. New York: HarperPerennial.

    Google Scholar 

  • Sams, M., Aulanko, R., Hamalainen, M., Hari, R., Lounasmaa, O. V., Lu, S. T., &Simola, J. (1991). Seeing speech: Visual information from lip movements modifies activity in the human auditory cortex.Neuroscience Letters,127, 141–145.

    PubMed  Article  Google Scholar 

  • Samuelsson, S., &Rönnberg, J. (1993). Implicit and explicit use of scripted constraints in lip-reading.European Journal of Cognitive Psychology,5, 201–233.

    Article  Google Scholar 

  • Sebastián-Gallés, N., Dupoux, E., Costa, A., &Mehler, J. (2000). Adaptation to time-compressed speech: Phonological determinants.Perception & Psychophysics,62, 834–842.

    Article  Google Scholar 

  • Sebastián-Gallés, N., Echeverría, S., &Bosch, L. (2005). The influence of initial exposure on lexical representation: Comparing early and simultaneous bilinguals.Journal of Memory & Language,52, 240–255.

    Article  Google Scholar 

  • Sebastián-Gallés, N., &Soto-Faraco, S. (1999). Online processing of native and non-native phonemic contrasts in early bilinguals.Cognition,72, 111–123.

    PubMed  Article  Google Scholar 

  • Solà, J., Lloret, M. R., Mascaró, J., &Pérez Saldanya, M. (2000).Gramàtica del català contemporani: Volum 1. Introducció: Fonètica i fonologia. Morfologia [Grammar of contemporary Catalan: Vol. 1. Introduction: Phonetics and phonology. Morphology]. Barcelona: Editorial Empúries.

    Google Scholar 

  • Soto-Faraco, S., Navarra, J., &Alsius, A. (2004). Assessing automaticity in audiovisual speech integration: Evidence from the speeded classification task.Cognition,92, B13-B23.

    PubMed  Article  Google Scholar 

  • Soto-Faraco, S., Sebastián-Gallés, N., &Cutler, A. (2001). Segmental and suprasegmental mismatch in lexical access.Journal of Memory & Language,45, 412–432.

    Article  Google Scholar 

  • Sumby, W. H., &Pollack, I. (1954). Visual contribution to speech intelligibility in noise.Journal of the Acoustical Society of America,26, 212–215.

    Article  Google Scholar 

  • Summerfield, Q. (1987). Some preliminaries to a comprehensive account of audio-visual speech perception. In B. Dodd & R. Campbell (Eds.),Hearing by eye: The psychology of lip-reading (pp. 3–51). London: Erlbaum.

    Google Scholar 

  • Summerfield, Q., MacLeod, A., McGrath, M., &Brooke, M. (1989). Lips, teeth, and the benefits of lipreading. In A. W. Young & H. D. Ellis (Eds.),Handbook of research on face processing (pp. 223–233). Amsterdam: North-Holland.

    Google Scholar 

  • Suomi, K., McQueen, J. M., &Cutler, A. (1997). Vowel harmony and speech segmentation in Finnish.Journal of Memory & Language,36, 422–444.

    Article  Google Scholar 

  • Vatikiotis-Bateson, E., Munhall, K. G., Kasahara, Y., Garcia, F., &Yehia, H. (1996). Characterizing audiovisual information during speech. InProceedings of the 4th International Conference on Spoken Language Processing (ICSLP 96) (Vol. 3, pp. 1485–1488). New York: IEEE Press. Available at www.asel.udel.edu/icslp/cdrom/vol3/1004/a1004.pdf.

    Chapter  Google Scholar 

  • Walden, B. E., Erdman, S. A., Montgomery, A. A., Schwartz, D. M., &Prosek, R. A. (1981). Some effects of training on speech recognition by hearing-impaired adults.Journal of Speech & Hearing Research,24, 207–216.

    Google Scholar 

  • Walden, B. E., Prosek, R. A., Montgomery, A. A., Scherr, C. K., &Jones, C. J. (1977). Effects of training on the visual recognition of consonants.Journal of Speech & Hearing Research,20, 130–145.

    Google Scholar 

  • Weikum, W. M., Werker, J. F., Vouloumanos, A., Navarra-Ordoño, J., Soto-Faraco, S., & Sebastián-Gallés, N. (2004, November).When can infants discriminate languages using only visual speech information? Poster presented at the 29th Boston University Conference on Child Development, Boston.

  • Yehia, H. C., Kuratate, T., &Vatikiotis-Bateson, E. (2002). Linking facial animation, head motion and speech acoustics.Journal of Phonetics,30, 555–568.

    Article  Google Scholar 

  • Zatorre, R. J. (2001). Do you see what I’m saying? Interactions between auditory and visual cortices in cochlear implant users.Neuron,31, 13–14.

    PubMed  Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Salvador Soto-Faraco.

Additional information

This work was supported by a grant from the Human Early Learning Partnership of British Columbia, Grant TIN2004-04363-C03-02 from the Ministerio de Educación y Ciencia of Spain and the “Ramón y Cajal” Program, Grant 410-2004-0744 from the Social Sciences and Humanities Research Council of Canada, Human Frontier Science Program Grant RPG 68/2002, and Grant JSMF-20002079 from the James S. McDonnell Foundation.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Soto-Faraco, S., Navarra, J., Weikum, W.M. et al. Discriminating languages by speech-reading. Perception & Psychophysics 69, 218–231 (2007). https://doi.org/10.3758/BF03193744

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.3758/BF03193744

Keywords

  • Speech Perception
  • Visual Speech
  • Linguistic Group
  • Visual Correlate
  • Bilingual Speaker