Multi-Modal Perception of Tone

Wang, Yue; Sereno, Joan A.; Jongman, Allard

doi:10.1007/978-981-15-7606-5_9

Yue Wang⁸,
Joan A. Sereno⁹ &
Allard Jongman⁹

Part of the book series: Chinese Language Learning Sciences ((CLLS))

776 Accesses
4 Citations
1 Altmetric

Abstract

This chapter surveys the role of visual cues in Chinese lexical tone production and perception, addressing the extent to which visual information involves either linguistically relevant cues to signal tonal category distinctions or is attention-grabbing in general. Specifically, the survey summarizes research findings on which visual facial cues are relevant for tone production, whether these cues are adopted in native and non-native audio-visual tone perception, and whether visual hand gestures also affect tone perception. Production findings demonstrate that head, jaw, eyebrow, and lip movements are aligned with specific spatial and temporal pitch movement trajectories of different tones, suggesting linguistically meaningful associations of these visual cues to tone articulation. Perception findings consistently show that specific facial and hand gestures corresponding to pitch movements for individual tones do benefit tone intelligibility, and these benefits can be augmented by linguistic experience. Together, these findings suggest language-specific mechanisms in cross-modal tone production and perception.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Apfelstadt, H. (1988). What makes children sing well? Applications of Research in Music Education, 7, 27–32.
Google Scholar
Attina, V., Gibert, G., Vatikiotis-Bateson, E., & Burnham, D. (2010). Production of Mandarin lexical tones: Auditory and visual components. In Proceedings of International Conference on Auditory-visual Speech Processing (AVSP) 2010, Hakone.
Google Scholar
Burnham, D., Ciocca, V., & Stokes, S. (2001a). Auditory–visual perception of lexical tone. In P. Dalsgaard, B. Lindberg, H. Benner, & Z. H. Tan, (eds.), Proceedings of the 7th Conference on Speech Communication and Technology, EUROSPEECH 2001, Scandinavia, pp. 395–398.
Google Scholar
Burnham, D., Lau, S., Tam, H., & Schoknecht, C. (2001b). Visual discrimination of Cantonese tone by tonal but non-Cantonese speakers, and by nontonal language speakers. In D. Massaro, J. Light, & K. Geraci (eds.), Proceedings of International Conference on Auditory-visual Speech Processing (AVSP) 2001, Adelaide, SA, pp. 155–160.
Google Scholar
Burnham, D., Kasisopa, B., Reid, A., Luksaneeyanawin, S., Lacerda, F., Attina, V., et al. (2015). Universality and language-specific experience in the perception of lexical tone and pitch. Applied Psycholinguistics, 36, 1459–1491.
Google Scholar
Burnham, D., Reynolds, J., Vatikiotis-Bateson, E., Yehia, H., & Ciocca, V. (2006). The perception and production of phones and tones: The role of rigid and non-rigid face and head motion. In Proceedings of the International Seminar on Speech Production 2006, Ubatuba.
Google Scholar
Campbell, R., Dodd, B., & Burnham, D. (1998). Hearing by Eye II: Advances in the Psychology of Speechreading and Audio-visual Speech. Hove, UK: Psychology Press.
Google Scholar
Chen, T. H., & Massaro, D. W. (2008). Seeing pitch: Visual information for lexical tones of Mandarin-Chinese. Journal of the Acoustical Society of America, 123, 2356–2366.
Google Scholar
Connell, L., Cai, Z. G., & Holler, J. (2013). Do you see what I’m singing? Visuospatial movement biases pitch perception. Brain and Cognition, 81, 124–130.
Google Scholar
Desai, S., Stickney, G., & Zeng, F. G. (2008). Auditory-visual speech perception in normal-hearing and cochlear-implant listeners. Journal of the Acoustical Society of America, 123, 428–440.
Google Scholar
Dohen, M., & Loevenbruck, H. (2005). Audiovisual production and perception of contrastive focus in French: A multispeaker study. Interspeech, 2005, 2413–2416.
Google Scholar
Dohen, M., Loevenbruck, H., & Hill, H. (2006). Visual correlates of prosodic contrastive focus in French: Description and inter-speaker variability. In R. Hoffmann & H. Mixdorff (eds.), Speech Prosody 2006, pp. 221–224.
Google Scholar
Ferguson, S. H., & Kewley-Port, D. (2002). Vowel intelligibility in clear and conversational speech for normal-hearing and hearing-impaired listeners. Journal of the Acoustical Society of America, 112, 259–271.
Google Scholar
Ferguson, S. H., & Kewley-Port, D. (2007). Talker differences in clear and conversational speech: Acoustic characteristics of vowels. Journal of Speech, Language, and Hearing Research, 50, 1241–1255.
Google Scholar
Flecha-Garcia, M. L. (2010). Eyebrow raises in dialogue and their relation to discourse structure, utterance function and pitch accents in English. Speech Communication, 52, 542–554.
Google Scholar
Fromkin, V. (1978). Tone: A linguistic survey. New York, NY: Academic Press.
Google Scholar
Garg, S., Hamarneh, G., Jongman, Sereno, J.A., & Wang, Y. (2019). Computer-vision analysis reveals facial movements made during Mandarin tone production align with pitch trajectories. Speech Communication, 113, 47–62.
Google Scholar
Grant, K. W., Walden, B. E., & Seitz, P. F. (1998). Auditory-visual speech recognition by hearing-impaired subjects: Consonant recognition, sentence recognition, and auditory-visual integration. Journal of the Acoustical Society of America, 103, 2677–2690.
Google Scholar
Han, Y., Goudbeek, M., Mos, M., & Swerts, M. (2019). Effects of modality and speaking style on Mandarin tone identification by non-native listeners. Phonetica, 76, 263–286. https://doi.org/10.1159/000489174.
Hannah, B., Wang, Y., Jongman, A., Sereno, J. A., Cao, J., & Nie, Y. (2017). Cross-modal association between auditory and visuospatial information in Mandarin tone perception in noise by native and non-native perceivers. Frontiers in Psychology, 8, 2051.
Google Scholar
Howie, J. M. (1976). Acoustical studies of Mandarin vowels and tones. Cambridge: Cambridge University Press.
Google Scholar
Huron, D., & Shanahan, D. (2013). Eyebrow movements and vocal pitch height: Evidence consistent with an ethological signal. Journal of the Acoustical Society of America, 133, 2947–2952.
Google Scholar
Ishi, C. T., Haas, J., Wilbers, F. P., Ishiguro, H., & Hagita, N. (2007). Analysis of head motions and speech, and head motion control in an android. Paper presented at the International Conference on Intelligent Robots and Systems, San Diego, CA.
Google Scholar
Jongman, A., Wang, Y., & Kim, B. (2003). Contribution of semantic and facial information to perception of non-sibilant fricatives. Journal of Speech, Language & Hearing Research, 46, 1367–1377.
Google Scholar
Kasisopa, B., El-Khoury Antonios, L., Jongman, A., Sereno, J. A., & Burnham, D. (2018). Training children to perceive non-native lexical tones: Tone language background, bilingualism, and auditory-visual information. Frontiers in Psychology, 9, 1508. https://doi.org/10.3389/fpsyg.2018.01508.
Article Google Scholar
Kelly, S., Bailey, A., & Hirata, Y. (2017). Metaphoric gestures facilitate perception of intonation more than length in auditory judgments of non-native phonemic contrasts. Collabra: Psychology 3(7). https://doi.org/10.1525/collabra.76.
Kim, J., & Davis, C. (2001). Visible speech cues and auditory detection of spoken sentences: An effect of degree of correlation between acoustic and visual properties. In International Conference on Auditory-visual Speech Processing (AVSP) 2001, Aalborg.
Google Scholar
Kim, J., & Davis, C. (2014). Comparing the consistency and distinctiveness of speech produced in quiet and in noise. Computer, Speech and Language, 28, 598–606.
Google Scholar
Kim, J., Cvejic, E., & Davis, C. (2014). Tracking eyebrows and head gestures associated with spoken prosody. Speech Communication, 57, 317–330.
Google Scholar
Kim, J., Sironic, A., & Davis, C. (2011). Hearing speech in noise: Seeing a loud talker is better. Perception, 40, 853–862.
Google Scholar
Lehiste, I. (1970). Suprasegmentals. Cambridge, MA: MIT.
Google Scholar
Leung, K., Jongman, A., Wang, Y., & Sereno, J. A. (2016). Acoustic characteristics of clearly spoken english tense and lax vowels. Journal of the Acoustical Society of America, 140, 45–58.
Google Scholar
Liao, M. Y. (2008). The effects of gesture use on young children’s pitch accuracy for singing tonal patterns. International Journal of Music Education, 26, 197–2113.
Google Scholar
Liao, M. Y., & Davidson, J. W. (2016). The use of gesture techniques in children’s singing. International Journal of Music Education, 25, 82–94.
Google Scholar
Liu, Y., Wang, M., Perfetti, C. A., Brubaker, B., Wu, S., & MacWhinney, B. (2011). Learning a tonal language by attending to the tone: An in vivo experiment. Language Learning, 61, 1119–1141.
Google Scholar
Maniwa, K., Jongman, A., & Wade, T. (2008). Perception of clear fricatives by normal-hearing and simulated hearing-impaired listeners. Journal of the Acoustical Society of America, 123, 1114–1125.
Google Scholar
Maniwa, K., Jongman, A., & Wade, T. (2009). Acoustic characteristics of clearly spoken english fricatives. Journal of the Acoustical Society of America, 125, 3962–3973.
Google Scholar
Mixdorff, H., Hu, Y., & Burnham, D. (2005). Visual cues in Mandarin tone perception. In Proceedings of the 9th European Conference on Speech Communication and Technology, ISCA, Bonn, Germany, pp. 405–408.
Google Scholar
Morett, L. M., & Chang, L.-Y. (2015). Emphasizing sound and meaning: Pitch gestures enhance Mandarin lexical tone acquisition. Language and Cognitive Neuroscience, 30, 347–353.
Google Scholar
Moisik, S. R., Lin, H., & Esling, J. H. (2014). A study of laryngeal gestures in Mandarin citation tones using simultaneous laryngoscopy and laryngeal ultrasound (SLLUS). Journal of the International Phonetic Association, 44, 21–58.
Google Scholar
Munhall, K. G., Jones, J. A., Callan, D. E., Kuratate, T., & Vatikiotis-Bateson, E. (2004). Visual prosody and speech intelligibility: Head movement improves auditory speech perception. Psychological Science, 15, 133–137.
Google Scholar
Perkell, J. S., Zandipour, M., Matthies, M. L., & Lane, H. (2002). Economy of effort in different speaking conditions. I. A preliminary study of intersubject differences and modeling issues. Journal of the Acoustical Society of America, 112, 1627–1641.
Google Scholar
Reid, A., Burnham, D., Kasisopa, B., Reilly, R., Attina, V., Rattanasone, N. X., & Best, C. T. (2015). Perceptual assimilation of lexical tone: The roles of language experience and visual information. Attention, Perception and Psychophysics, 77, 571–591.
Google Scholar
Rouger, J., Lagleyre, S., Fraysse, B., Deneve, S., Deguine, O., & Barone, P. (2007). Evidence that cochlear-implanted deaf patients are better multisensory integrators. Proceedings of the National Academy of Sciences, 104, 7295–7300.
Google Scholar
Scarbourough, R., Keating, P., Mattys, S. L., Cho, T., & Alwan, A. (2009). Optical phonetics and visual perception of lexical and phrasal stress in English. Language and Speech, 51, 135–175.
Google Scholar
Schorr, E. A., Fox, N. A., van Wassenhove, V., & Knudsen, E. I. (2005). Auditory-visual fusion in speech perception in children with cochlear implants. Proceedings of the National Academy of Sciences, 102, 18748–18750.
Google Scholar
Shaw, J. A., Chen, W. R., Proctor, M. I., Derrick, D., & Dakhoul, E. (2014). On the inter-dependence of tonal and vocalic production goals in Chinese. Paper presented at the International Seminar on Speech Production (ISSP), Cologne, Germany.
Google Scholar
Smith, D., & Burnham, D. (2012). Facilitation of Mandarin tone perception by visual speech in clear and degraded audio: Implications for cochlear implants. Journal of the Acoustical Society of America, 131, 1480–1489.
Google Scholar
Srinivasan, R. J., & Massaro, D. W. (2003). Perceiving prosody from the face and voice: Distinguishing statements from echoic questions in english. Language and Speech, 46, 1–22.
Google Scholar
Sumby, W. H., & Pollack, I. (1954). Visual contribution to speech intelligibility in noise. Journal of the Acoustical Society of America, 26, 212–215.
Google Scholar
Swerts, M., & Krahmer, E. (2010). Visual prosody of newsreaders: Effects of information structure, emotional content and intended audience on facial expressions. Journal of Phonetics, 38, 197–206.
Google Scholar
Tang, L., Hannah, B., Jongman, Sereno, Wang, Y., & Hamarneh, G. (2015). Examining visible articulatory features in clear and plain speech. Speech Communication, 75, 1–13.
Google Scholar
Traunmüller & Öhrström. (2007). Audiovisual perception of openness and lip rounding in front vowels. Journal of Phonetics, 35, 244–258.
Google Scholar
Wang, Y., Spence, M. M., Jongman, A., & Sereno, J. A. (1999). Training American listeners to perceive Mandarin tones. Journal of the Acoustical Society of America, 106, 3649–3658.
Google Scholar
Welch, G. F. (1985). A schema theory of how children learn to sing in tune. Psychology of Music, 13, 3–18.
Google Scholar
Yehia, H. C., Kuratate, T., & Vatikiotis-Bateson, E. (2002). Linking facial animation, head motion and speech acoustics. Journal of Phonetics, 30, 555–568.
Google Scholar
Yip, M. J. W. (2002). Tone (pp. 1–14). New York, NY: Cambridge University Press.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Linguistics, Simon Fraser University, Burnaby, BC, Canada
Yue Wang
Department of Linguistics, University of Kansas, Lawrence, KS, USA
Joan A. Sereno & Allard Jongman

Authors

Yue Wang
View author publications
You can also search for this author in PubMed Google Scholar
Joan A. Sereno
View author publications
You can also search for this author in PubMed Google Scholar
Allard Jongman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yue Wang .

Editor information

Editors and Affiliations

National Taiwan Normal University, Taipei, Taiwan
Huei‐Mei Liu
National Taiwan University, Taipei, Taiwan
Feng‐Ming Tsao
The Hong Kong Polytechnic University, Hong Kong, Hong Kong
Ping Li

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wang, Y., Sereno, J.A., Jongman, A. (2020). Multi-Modal Perception of Tone. In: Liu, H., Tsao, F., Li, P. (eds) Speech Perception, Production and Acquisition. Chinese Language Learning Sciences. Springer, Singapore. https://doi.org/10.1007/978-981-15-7606-5_9

Download citation

DOI: https://doi.org/10.1007/978-981-15-7606-5_9
Published: 15 September 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-7605-8
Online ISBN: 978-981-15-7606-5
eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics