Abstract
This chapter will present a study that explored speech expressiveness and information arrangement in relation to discourse prosody in continuous Mandarin speeches. Based on corpus data from four diverse speech genres, our study examined perceived prosodic highlights in correlation with speech expressiveness and the convergence of prominence patterns for discourse prosody. Using the corpus linguistic approach and quantitative analyses, we first summarized the number of perceived emphasis token patterns and their distribution across speech genres. Then, we conducted two experiments: (i) speech expressiveness by information weighting calculation that is based on prosodic highlights allocation and (ii) discourse prosody through the convergence of patterned prosodic highlights in limited degrees of contrastive strength. The results from the first experiment pinpointed major differences across speech genres in terms of expressiveness, demonstrating that the most spontaneous type of speech carried the largest amount of information. The second experiment found that a limited number of intonation patterns converged for higher-level discourse prosody. Ultimately, our research uncovered the sources contributing to speech expressiveness and diversity across speech genres, while at the same time showed successful convergence of divergent surface variations from speech signals to deduce systematic and predictable patterns of discourse-level global prosody.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The term “prosodic highlights” in our study refers to the prosody-related prominence through perception, which is based on the speech context. Prosodic highlights are defined in terms of features in perceivable higher/lower pitch and/or relatively stronger/weaker loudness, as well as degrees of contrast (see Sect. 24.2.2. for the definitions of annotating levels of perceived emphases). In this chapter, we use perceived prosodic highlights, perceived prominence, and emphasis interchangeably.
- 2.
The hierarchical relationship of discourse-prosodic units is presented in the spirit of the hierarchical prosodic phrase grouping (HPG) framework (Tseng 鄭秋豫 2010; Tseng et al. 2005a, 2005b; Tseng and Su 2008) that our study adopted when annotating the discourse-prosodic units/boundaries in the continuous speeches. Please refer to Sect. 24.2.2 for a brief introduction of the framework.
- 3.
More precisely, phrasal-level discourse-prosodic units refer to prosodic phrase units (PPh) in the HPG framework. Please refer to Sect. 24.2.2 for further details.
- 4.
We adopted the term “variation” not with its traditional sense in linguistics but rather in the sense from music studies; thus, it is more similar to the concept of melodic variations.
- 5.
- 6.
As explained by Tseng (2013), in utilizing the HPG framework for the annotation of discourse-prosodic units, the main strength is that such a framework is not text-bounded, nor is it syntactically predetermined. While the framework purposely distances itself away from the possible connotations associated with other levels of linguistic information (Tseng 2013), it pays further attention to units of higher discourse-prosodic levels. Since the main focus of our study was to capture the features of speech expressiveness and discourse prosody, it was essential that we incorporated such a framework, which takes into consideration prosodic features from units whose size reaches beyond that of the sentential level.
- 7.
As will be shown in Sect. 24.4.1, the chunking sizes at different discourse-prosodic levels differed drastically across speech genres. This is the main reason that we had to remove the length effect.
- 8.
Again, we used the term “variation” in the sense of melodic variation in music studies.
- 9.
Figure 24.2b demonstrates the “letter assignment” in Step Three. We took the first four PPhs in Figure 24.2a to illustrate the procedure of the letter assignment. As explained, whenever the following PPh corresponded to a different ET pattern, a new letter was assigned. As a result, the final alphabetic sequence was not limited to merely A/B combinations as illustrated in Figure 24.2b, as more complex patterns did exist.
- 10.
- 11.
In Tseng and Su (2012), the speech data incorporated in their analyses included the two read speech genres CNA and WB, as well as the spontaneous lecture SpnL. However, the degree of prominence levels did not cover reduction E0 in their study.
- 12.
In their findings, Tseng and Su (2012) suggested that the six major ET patterns are (1) “E1”; (2) “E2 E1”; (3) “E1 E2 E1”; (4) “E1 E2”; (5) “E2”; and (6) “E2 E1 E2”. The cross-speech genre analyses showed that CNA and WB were further distinguished by the “E2 E1” and “E1 E2” patterns, respectively, whereas the lecture data was dominated by the “E1” pattern (Tseng and Su 2012).
- 13.
In Tseng and Su (2012), it was shown that the pattern “E1” only took up about 10% of the total ET patterns in CNA and WB, compared to about 39% of the “E1” pattern found in the total ET patterns in the SpnL data.
- 14.
The calculation of the weighting scores of information allocation was based on normalized BG/PG units.
References
‘t Hart, Johan, René Collier, and Antonie Cohen. 1990. A perceptual study of intonation: An experimental-phonetic approach to speech melody. Cambridge: Cambridge University Press.
Baumann, Stefan, Oliver Niebuhr, and Bastian Schroeter. 2016. Acoustic cues to perceived prominence levels: Evidence from German spontaneous speech. In Proceedings of Speech Prosody 2016, 711-715. Boston, Massachusetts.
Boersma, Paul, and David Weenink. 2015. Praat: Doing phonetics by computer. www.praat.org. (20 Nov, 2015.)
Campbell, Nick. 2002. Labeling natural conversational speech data. Paper presented at the 2002 Autumn Meeting of Acoustic Society of Japan (ASJ), 273-274. Akita, Japan
Chen, Helen K. Y., Laurent Prévot, Roxane Bertrand, Béatrice Priego-Valverde, and Philippe Blache. 2012. Toward a Mandarin-French corpus of interactional data. Paper presented at the 16th Workshop on the Semantics and Pragmatics of Dialogues. Paris, France.
Erickson, Donna. 2005. Expressive speech: Production, perception and application to speech synthesis. Acoustical Science and Technology 26:317-325.
Fujisaki, Hiroya. 2004. Prosody, information, and modeling—With emphasis on tonal features of speech. In Proceedings of Speech Prosody 2004, ed. Bernard Bel and Isabelle Marlien, 1-10. Nara, Japan.
Halliday, Michael A. K. 1970. A course in spoken English: Intonation. London: Oxford University Press.
Kohler, Klaus J. 1997. Modelling prosody in spontaneous speech. In Computing prosody, ed. Sagisaka, Yoshinori, Nick Campbell, and Norio Higuchi, 187-210. New York: Springer.
Patel, Aniruddh D. 2008. Music, language, and the brain. New York: Oxford University Press.
de Saussure, Ferdinand. 1966. Course in general linguistics. (Wade Baskin, Trans.). New York: McGraw-Hill Book Company.
Silverman, Kim E., Mary Beckman, John Pitrelli, Mari Ostendorf, Colin Wightman, Patti Price, Janet B. Pierrehumbert, and Julia Hirschberg. 1992. ToBI: A standard for labeling English prosody. In Proceedings of the 2nd International Conference on Spoken Language Processing (ICSLP) 2: 867-870. Alberta, Canada.
Tatham, Mark, and Katherine Morton. 2004. Expressive in speech: analysis and synthesis. New York: Oxford University Press.
Tseng, Chiu-yu 鄭秋豫. 2010. An F0 analysis of discourse construction and global information in realized narrative prosody 語篇的基頻構組與語流韻律體現. Language and Linguistics 語言暨語言學 11:183-218.
Tseng, Chiu-yu. 2013. Output prosody—How information highlights are piggybacked by discourse structure. Zhongguo Yuyin Xuebao 中國語音學報 4:109-124.
Tseng, Chiu-yu, and Chao-yu Su. 2008. Discourse prosody and context—Global F0 and tempo modulations. In Proceedings of Interspeech 2008, 1200-1203. Brisbane, Australia.
Tseng, Chiu-yu, and Chao-yu Su. 2012. Information allocation and prosodic expressiveness in continuous speech: A Mandarin cross-genre analysis. In Proceedings of the 8th International Symposium on Chinese Spoken Language (ISCSLP 2012), 243-246. Hong Kong.
Tseng, Chiu-yu, and Chao-yu Su. 2014. Where and how to make an emphasis? —L2 distinct prosody and why. In Proceedings of the 9th International Symposium on Chinese Spoken Language (ISCSLP 2014), 633-637. Singapore.
Tseng, Chiu-yu, Yun-ching Cheng, Wei-shan Lee, and Feng-lan Huang. 2003. Collecting Mandarin speech databases for prosody investigation. In Proceedings of the Oriental COCOSDA 2003, 225-232. Singapore.
Tseng, Chiu-yu, Yun-Ching Cheng, and Chun-Hsiang Chang. 2005a. Sinica COSPRO and toolkit—Corpora and platform of Mandarin Chinese fluent speech. In Proceedings of the Oriental COCOSDA 2005, 23-28. Jakarta, Indonesia.
Tseng, Chiu-yu, Shao-huang Pin, Yeh-lin Lee, Hsin-min Wang, and Yong-cheng Chen. 2005b. Fluent speech prosody: Framework and modeling. Speech Communication 46:284-309.
Tseng, Chiu-yu, Lin-shan Lee, and Chao-yu Su. 2008. Spontaneous Mandarin speech prosody—the NTU DSP lecture corpus. In Proceedings of the Oriental COCOSDA, 171-174. Kyoto, Japan.
Tseng, Chiu-yu, Chao-yu Su, and Chi-Feng Huang. 2011. Prosodic highlights in Mandarin continuous speech—Cross-genre attributes and implications. In Proceedings of Interspeech 2011, 1381-1384. Florence, Italy.
Wichmann, Anne. 2014. Intonation in text and discourse: Beginnings, middles and ends. London: Routledge.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Chen, H.KY., Tseng, CY. (2023). Perception Correlated Information Allocation and Pattern Convergence for Discourse Prosody. In: Huang, CR., Hsieh, SK., Jin, P. (eds) Chinese Language Resources. Text, Speech and Language Technology, vol 49. Springer, Cham. https://doi.org/10.1007/978-3-031-38913-9_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-38913-9_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-38912-2
Online ISBN: 978-3-031-38913-9
eBook Packages: EducationEducation (R0)