Skip to main content

Multi-Modal Perception of Tone

  • Chapter
  • First Online:
Speech Perception, Production and Acquisition

Part of the book series: Chinese Language Learning Sciences ((CLLS))

Abstract

This chapter surveys the role of visual cues in Chinese lexical tone production and perception, addressing the extent to which visual information involves either linguistically relevant cues to signal tonal category distinctions or is attention-grabbing in general. Specifically, the survey summarizes research findings on which visual facial cues are relevant for tone production, whether these cues are adopted in native and non-native audio-visual tone perception, and whether visual hand gestures also affect tone perception. Production findings demonstrate that head, jaw, eyebrow, and lip movements are aligned with specific spatial and temporal pitch movement trajectories of different tones, suggesting linguistically meaningful associations of these visual cues to tone articulation. Perception findings consistently show that specific facial and hand gestures corresponding to pitch movements for individual tones do benefit tone intelligibility, and these benefits can be augmented by linguistic experience. Together, these findings suggest language-specific mechanisms in cross-modal tone production and perception.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Apfelstadt, H. (1988). What makes children sing well? Applications of Research in Music Education, 7, 27–32.

    Google Scholar 

  • Attina, V., Gibert, G., Vatikiotis-Bateson, E., & Burnham, D. (2010). Production of Mandarin lexical tones: Auditory and visual components. In Proceedings of International Conference on Auditory-visual Speech Processing (AVSP) 2010, Hakone.

    Google Scholar 

  • Burnham, D., Ciocca, V., & Stokes, S. (2001a). Auditory–visual perception of lexical tone. In P. Dalsgaard, B. Lindberg, H. Benner, & Z. H. Tan, (eds.), Proceedings of the 7th Conference on Speech Communication and Technology, EUROSPEECH 2001, Scandinavia, pp. 395–398.

    Google Scholar 

  • Burnham, D., Lau, S., Tam, H., & Schoknecht, C. (2001b). Visual discrimination of Cantonese tone by tonal but non-Cantonese speakers, and by nontonal language speakers. In D. Massaro, J. Light, & K. Geraci (eds.), Proceedings of International Conference on Auditory-visual Speech Processing (AVSP) 2001, Adelaide, SA, pp. 155–160.

    Google Scholar 

  • Burnham, D., Kasisopa, B., Reid, A., Luksaneeyanawin, S., Lacerda, F., Attina, V., et al. (2015). Universality and language-specific experience in the perception of lexical tone and pitch. Applied Psycholinguistics, 36, 1459–1491.

    Google Scholar 

  • Burnham, D., Reynolds, J., Vatikiotis-Bateson, E., Yehia, H., & Ciocca, V. (2006). The perception and production of phones and tones: The role of rigid and non-rigid face and head motion. In Proceedings of the International Seminar on Speech Production 2006, Ubatuba.

    Google Scholar 

  • Campbell, R., Dodd, B., & Burnham, D. (1998). Hearing by Eye II: Advances in the Psychology of Speechreading and Audio-visual Speech. Hove, UK: Psychology Press.

    Google Scholar 

  • Chen, T. H., & Massaro, D. W. (2008). Seeing pitch: Visual information for lexical tones of Mandarin-Chinese. Journal of the Acoustical Society of America, 123, 2356–2366.

    Google Scholar 

  • Connell, L., Cai, Z. G., & Holler, J. (2013). Do you see what I’m singing? Visuospatial movement biases pitch perception. Brain and Cognition, 81, 124–130.

    Google Scholar 

  • Desai, S., Stickney, G., & Zeng, F. G. (2008). Auditory-visual speech perception in normal-hearing and cochlear-implant listeners. Journal of the Acoustical Society of America, 123, 428–440.

    Google Scholar 

  • Dohen, M., & Loevenbruck, H. (2005). Audiovisual production and perception of contrastive focus in French: A multispeaker study. Interspeech, 2005, 2413–2416.

    Google Scholar 

  • Dohen, M., Loevenbruck, H., & Hill, H. (2006). Visual correlates of prosodic contrastive focus in French: Description and inter-speaker variability. In R. Hoffmann & H. Mixdorff (eds.), Speech Prosody 2006, pp. 221–224.

    Google Scholar 

  • Ferguson, S. H., & Kewley-Port, D. (2002). Vowel intelligibility in clear and conversational speech for normal-hearing and hearing-impaired listeners. Journal of the Acoustical Society of America, 112, 259–271.

    Google Scholar 

  • Ferguson, S. H., & Kewley-Port, D. (2007). Talker differences in clear and conversational speech: Acoustic characteristics of vowels. Journal of Speech, Language, and Hearing Research, 50, 1241–1255.

    Google Scholar 

  • Flecha-Garcia, M. L. (2010). Eyebrow raises in dialogue and their relation to discourse structure, utterance function and pitch accents in English. Speech Communication, 52, 542–554.

    Google Scholar 

  • Fromkin, V. (1978). Tone: A linguistic survey. New York, NY: Academic Press.

    Google Scholar 

  • Garg, S., Hamarneh, G., Jongman, Sereno, J.A., & Wang, Y. (2019). Computer-vision analysis reveals facial movements made during Mandarin tone production align with pitch trajectories. Speech Communication, 113, 47–62.

    Google Scholar 

  • Grant, K. W., Walden, B. E., & Seitz, P. F. (1998). Auditory-visual speech recognition by hearing-impaired subjects: Consonant recognition, sentence recognition, and auditory-visual integration. Journal of the Acoustical Society of America, 103, 2677–2690.

    Google Scholar 

  • Han, Y., Goudbeek, M., Mos, M., & Swerts, M. (2019). Effects of modality and speaking style on Mandarin tone identification by non-native listeners. Phonetica, 76, 263–286. https://doi.org/10.1159/000489174.

  • Hannah, B., Wang, Y., Jongman, A., Sereno, J. A., Cao, J., & Nie, Y. (2017). Cross-modal association between auditory and visuospatial information in Mandarin tone perception in noise by native and non-native perceivers. Frontiers in Psychology, 8, 2051.

    Google Scholar 

  • Howie, J. M. (1976). Acoustical studies of Mandarin vowels and tones. Cambridge: Cambridge University Press.

    Google Scholar 

  • Huron, D., & Shanahan, D. (2013). Eyebrow movements and vocal pitch height: Evidence consistent with an ethological signal. Journal of the Acoustical Society of America, 133, 2947–2952.

    Google Scholar 

  • Ishi, C. T., Haas, J., Wilbers, F. P., Ishiguro, H., & Hagita, N. (2007). Analysis of head motions and speech, and head motion control in an android. Paper presented at the International Conference on Intelligent Robots and Systems, San Diego, CA.

    Google Scholar 

  • Jongman, A., Wang, Y., & Kim, B. (2003). Contribution of semantic and facial information to perception of non-sibilant fricatives. Journal of Speech, Language & Hearing Research, 46, 1367–1377.

    Google Scholar 

  • Kasisopa, B., El-Khoury Antonios, L., Jongman, A., Sereno, J. A., & Burnham, D. (2018). Training children to perceive non-native lexical tones: Tone language background, bilingualism, and auditory-visual information. Frontiers in Psychology, 9, 1508. https://doi.org/10.3389/fpsyg.2018.01508.

    Article  Google Scholar 

  • Kelly, S., Bailey, A., & Hirata, Y. (2017). Metaphoric gestures facilitate perception of intonation more than length in auditory judgments of non-native phonemic contrasts. Collabra: Psychology 3(7). https://doi.org/10.1525/collabra.76.

  • Kim, J., & Davis, C. (2001). Visible speech cues and auditory detection of spoken sentences: An effect of degree of correlation between acoustic and visual properties. In International Conference on Auditory-visual Speech Processing (AVSP) 2001, Aalborg.

    Google Scholar 

  • Kim, J., & Davis, C. (2014). Comparing the consistency and distinctiveness of speech produced in quiet and in noise. Computer, Speech and Language, 28, 598–606.

    Google Scholar 

  • Kim, J., Cvejic, E., & Davis, C. (2014). Tracking eyebrows and head gestures associated with spoken prosody. Speech Communication, 57, 317–330.

    Google Scholar 

  • Kim, J., Sironic, A., & Davis, C. (2011). Hearing speech in noise: Seeing a loud talker is better. Perception, 40, 853–862.

    Google Scholar 

  • Lehiste, I. (1970). Suprasegmentals. Cambridge, MA: MIT.

    Google Scholar 

  • Leung, K., Jongman, A., Wang, Y., & Sereno, J. A. (2016). Acoustic characteristics of clearly spoken english tense and lax vowels. Journal of the Acoustical Society of America, 140, 45–58.

    Google Scholar 

  • Liao, M. Y. (2008). The effects of gesture use on young children’s pitch accuracy for singing tonal patterns. International Journal of Music Education, 26, 197–2113.

    Google Scholar 

  • Liao, M. Y., & Davidson, J. W. (2016). The use of gesture techniques in children’s singing. International Journal of Music Education, 25, 82–94.

    Google Scholar 

  • Liu, Y., Wang, M., Perfetti, C. A., Brubaker, B., Wu, S., & MacWhinney, B. (2011). Learning a tonal language by attending to the tone: An in vivo experiment. Language Learning, 61, 1119–1141.

    Google Scholar 

  • Maniwa, K., Jongman, A., & Wade, T. (2008). Perception of clear fricatives by normal-hearing and simulated hearing-impaired listeners. Journal of the Acoustical Society of America, 123, 1114–1125.

    Google Scholar 

  • Maniwa, K., Jongman, A., & Wade, T. (2009). Acoustic characteristics of clearly spoken english fricatives. Journal of the Acoustical Society of America, 125, 3962–3973.

    Google Scholar 

  • Mixdorff, H., Hu, Y., & Burnham, D. (2005). Visual cues in Mandarin tone perception. In Proceedings of the 9th European Conference on Speech Communication and Technology, ISCA, Bonn, Germany, pp. 405–408.

    Google Scholar 

  • Morett, L. M., & Chang, L.-Y. (2015). Emphasizing sound and meaning: Pitch gestures enhance Mandarin lexical tone acquisition. Language and Cognitive Neuroscience, 30, 347–353.

    Google Scholar 

  • Moisik, S. R., Lin, H., & Esling, J. H. (2014). A study of laryngeal gestures in Mandarin citation tones using simultaneous laryngoscopy and laryngeal ultrasound (SLLUS). Journal of the International Phonetic Association, 44, 21–58.

    Google Scholar 

  • Munhall, K. G., Jones, J. A., Callan, D. E., Kuratate, T., & Vatikiotis-Bateson, E. (2004). Visual prosody and speech intelligibility: Head movement improves auditory speech perception. Psychological Science, 15, 133–137.

    Google Scholar 

  • Perkell, J. S., Zandipour, M., Matthies, M. L., & Lane, H. (2002). Economy of effort in different speaking conditions. I. A preliminary study of intersubject differences and modeling issues. Journal of the Acoustical Society of America, 112, 1627–1641.

    Google Scholar 

  • Reid, A., Burnham, D., Kasisopa, B., Reilly, R., Attina, V., Rattanasone, N. X., & Best, C. T. (2015). Perceptual assimilation of lexical tone: The roles of language experience and visual information. Attention, Perception and Psychophysics, 77, 571–591.

    Google Scholar 

  • Rouger, J., Lagleyre, S., Fraysse, B., Deneve, S., Deguine, O., & Barone, P. (2007). Evidence that cochlear-implanted deaf patients are better multisensory integrators. Proceedings of the National Academy of Sciences, 104, 7295–7300.

    Google Scholar 

  • Scarbourough, R., Keating, P., Mattys, S. L., Cho, T., & Alwan, A. (2009). Optical phonetics and visual perception of lexical and phrasal stress in English. Language and Speech, 51, 135–175.

    Google Scholar 

  • Schorr, E. A., Fox, N. A., van Wassenhove, V., & Knudsen, E. I. (2005). Auditory-visual fusion in speech perception in children with cochlear implants. Proceedings of the National Academy of Sciences, 102, 18748–18750.

    Google Scholar 

  • Shaw, J. A., Chen, W. R., Proctor, M. I., Derrick, D., & Dakhoul, E. (2014). On the inter-dependence of tonal and vocalic production goals in Chinese. Paper presented at the International Seminar on Speech Production (ISSP), Cologne, Germany.

    Google Scholar 

  • Smith, D., & Burnham, D. (2012). Facilitation of Mandarin tone perception by visual speech in clear and degraded audio: Implications for cochlear implants. Journal of the Acoustical Society of America, 131, 1480–1489.

    Google Scholar 

  • Srinivasan, R. J., & Massaro, D. W. (2003). Perceiving prosody from the face and voice: Distinguishing statements from echoic questions in english. Language and Speech, 46, 1–22.

    Google Scholar 

  • Sumby, W. H., & Pollack, I. (1954). Visual contribution to speech intelligibility in noise. Journal of the Acoustical Society of America, 26, 212–215.

    Google Scholar 

  • Swerts, M., & Krahmer, E. (2010). Visual prosody of newsreaders: Effects of information structure, emotional content and intended audience on facial expressions. Journal of Phonetics, 38, 197–206.

    Google Scholar 

  • Tang, L., Hannah, B., Jongman, Sereno, Wang, Y., & Hamarneh, G. (2015). Examining visible articulatory features in clear and plain speech. Speech Communication, 75, 1–13.

    Google Scholar 

  • Traunmüller & Öhrström. (2007). Audiovisual perception of openness and lip rounding in front vowels. Journal of Phonetics, 35, 244–258.

    Google Scholar 

  • Wang, Y., Spence, M. M., Jongman, A., & Sereno, J. A. (1999). Training American listeners to perceive Mandarin tones. Journal of the Acoustical Society of America, 106, 3649–3658.

    Google Scholar 

  • Welch, G. F. (1985). A schema theory of how children learn to sing in tune. Psychology of Music, 13, 3–18.

    Google Scholar 

  • Yehia, H. C., Kuratate, T., & Vatikiotis-Bateson, E. (2002). Linking facial animation, head motion and speech acoustics. Journal of Phonetics, 30, 555–568.

    Google Scholar 

  • Yip, M. J. W. (2002). Tone (pp. 1–14). New York, NY: Cambridge University Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yue Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Wang, Y., Sereno, J.A., Jongman, A. (2020). Multi-Modal Perception of Tone. In: Liu, H., Tsao, F., Li, P. (eds) Speech Perception, Production and Acquisition. Chinese Language Learning Sciences. Springer, Singapore. https://doi.org/10.1007/978-981-15-7606-5_9

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-7606-5_9

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-7605-8

  • Online ISBN: 978-981-15-7606-5

  • eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics