Perception Correlated Information Allocation and Pattern Convergence for Discourse Prosody

Chen, Helen Kai-Yun; Tseng, Chiu-Yu

doi:10.1007/978-3-031-38913-9_24

Helen Kai-Yun Chen⁵ &
Chiu-Yu Tseng⁶

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 49))

116 Accesses

Abstract

This chapter will present a study that explored speech expressiveness and information arrangement in relation to discourse prosody in continuous Mandarin speeches. Based on corpus data from four diverse speech genres, our study examined perceived prosodic highlights in correlation with speech expressiveness and the convergence of prominence patterns for discourse prosody. Using the corpus linguistic approach and quantitative analyses, we first summarized the number of perceived emphasis token patterns and their distribution across speech genres. Then, we conducted two experiments: (i) speech expressiveness by information weighting calculation that is based on prosodic highlights allocation and (ii) discourse prosody through the convergence of patterned prosodic highlights in limited degrees of contrastive strength. The results from the first experiment pinpointed major differences across speech genres in terms of expressiveness, demonstrating that the most spontaneous type of speech carried the largest amount of information. The second experiment found that a limited number of intonation patterns converged for higher-level discourse prosody. Ultimately, our research uncovered the sources contributing to speech expressiveness and diversity across speech genres, while at the same time showed successful convergence of divergent surface variations from speech signals to deduce systematic and predictable patterns of discourse-level global prosody.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The term “prosodic highlights” in our study refers to the prosody-related prominence through perception, which is based on the speech context. Prosodic highlights are defined in terms of features in perceivable higher/lower pitch and/or relatively stronger/weaker loudness, as well as degrees of contrast (see Sect. 24.2.2. for the definitions of annotating levels of perceived emphases). In this chapter, we use perceived prosodic highlights, perceived prominence, and emphasis interchangeably.
2.
The hierarchical relationship of discourse-prosodic units is presented in the spirit of the hierarchical prosodic phrase grouping (HPG) framework (Tseng 鄭秋豫 2010; Tseng et al. 2005a, 2005b; Tseng and Su 2008) that our study adopted when annotating the discourse-prosodic units/boundaries in the continuous speeches. Please refer to Sect. 24.2.2 for a brief introduction of the framework.
3.
More precisely, phrasal-level discourse-prosodic units refer to prosodic phrase units (PPh) in the HPG framework. Please refer to Sect. 24.2.2 for further details.
4.
We adopted the term “variation” not with its traditional sense in linguistics but rather in the sense from music studies; thus, it is more similar to the concept of melodic variations.
5.
As introduced in Tseng et al. (2003) and Tseng et al. (2005a), Sinica COSPRO is an intonation balanced speech corpus originally designed to examine the role of intonation and prosodic grouping in Mandarin continuous speech.
6.
As explained by Tseng (2013), in utilizing the HPG framework for the annotation of discourse-prosodic units, the main strength is that such a framework is not text-bounded, nor is it syntactically predetermined. While the framework purposely distances itself away from the possible connotations associated with other levels of linguistic information (Tseng 2013), it pays further attention to units of higher discourse-prosodic levels. Since the main focus of our study was to capture the features of speech expressiveness and discourse prosody, it was essential that we incorporated such a framework, which takes into consideration prosodic features from units whose size reaches beyond that of the sentential level.
7.
As will be shown in Sect. 24.4.1, the chunking sizes at different discourse-prosodic levels differed drastically across speech genres. This is the main reason that we had to remove the length effect.
8.
Again, we used the term “variation” in the sense of melodic variation in music studies.
9.
Figure 24.2b demonstrates the “letter assignment” in Step Three. We took the first four PPhs in Figure 24.2a to illustrate the procedure of the letter assignment. As explained, whenever the following PPh corresponded to a different ET pattern, a new letter was assigned. As a result, the final alphabetic sequence was not limited to merely A/B combinations as illustrated in Figure 24.2b, as more complex patterns did exist.
10.
Since the spontaneous speeches SpnL and SpnC were annotated with the reduction E0, we calculated the total number of ET patterns separately for the read and lecture speeches together (see Table 24.4a, without reduction) and spontaneous speeches (see Table 24.4b, with reduction).
11.
In Tseng and Su (2012), the speech data incorporated in their analyses included the two read speech genres CNA and WB, as well as the spontaneous lecture SpnL. However, the degree of prominence levels did not cover reduction E0 in their study.
12.
In their findings, Tseng and Su (2012) suggested that the six major ET patterns are (1) “E1”; (2) “E2 E1”; (3) “E1 E2 E1”; (4) “E1 E2”; (5) “E2”; and (6) “E2 E1 E2”. The cross-speech genre analyses showed that CNA and WB were further distinguished by the “E2 E1” and “E1 E2” patterns, respectively, whereas the lecture data was dominated by the “E1” pattern (Tseng and Su 2012).
13.
In Tseng and Su (2012), it was shown that the pattern “E1” only took up about 10% of the total ET patterns in CNA and WB, compared to about 39% of the “E1” pattern found in the total ET patterns in the SpnL data.
14.
The calculation of the weighting scores of information allocation was based on normalized BG/PG units.

References

‘t Hart, Johan, René Collier, and Antonie Cohen. 1990. A perceptual study of intonation: An experimental-phonetic approach to speech melody. Cambridge: Cambridge University Press.
Google Scholar
Baumann, Stefan, Oliver Niebuhr, and Bastian Schroeter. 2016. Acoustic cues to perceived prominence levels: Evidence from German spontaneous speech. In Proceedings of Speech Prosody 2016, 711-715. Boston, Massachusetts.
Google Scholar
Boersma, Paul, and David Weenink. 2015. Praat: Doing phonetics by computer. www.praat.org. (20 Nov, 2015.)
Campbell, Nick. 2002. Labeling natural conversational speech data. Paper presented at the 2002 Autumn Meeting of Acoustic Society of Japan (ASJ), 273-274. Akita, Japan
Google Scholar
Chen, Helen K. Y., Laurent Prévot, Roxane Bertrand, Béatrice Priego-Valverde, and Philippe Blache. 2012. Toward a Mandarin-French corpus of interactional data. Paper presented at the 16th Workshop on the Semantics and Pragmatics of Dialogues. Paris, France.
Google Scholar
Erickson, Donna. 2005. Expressive speech: Production, perception and application to speech synthesis. Acoustical Science and Technology 26:317-325.
Google Scholar
Fujisaki, Hiroya. 2004. Prosody, information, and modeling—With emphasis on tonal features of speech. In Proceedings of Speech Prosody 2004, ed. Bernard Bel and Isabelle Marlien, 1-10. Nara, Japan.
Google Scholar
Halliday, Michael A. K. 1970. A course in spoken English: Intonation. London: Oxford University Press.
Google Scholar
Kohler, Klaus J. 1997. Modelling prosody in spontaneous speech. In Computing prosody, ed. Sagisaka, Yoshinori, Nick Campbell, and Norio Higuchi, 187-210. New York: Springer.
Google Scholar
Patel, Aniruddh D. 2008. Music, language, and the brain. New York: Oxford University Press.
Google Scholar
de Saussure, Ferdinand. 1966. Course in general linguistics. (Wade Baskin, Trans.). New York: McGraw-Hill Book Company.
Google Scholar
Silverman, Kim E., Mary Beckman, John Pitrelli, Mari Ostendorf, Colin Wightman, Patti Price, Janet B. Pierrehumbert, and Julia Hirschberg. 1992. ToBI: A standard for labeling English prosody. In Proceedings of the 2nd International Conference on Spoken Language Processing (ICSLP) 2: 867-870. Alberta, Canada.
Google Scholar
Tatham, Mark, and Katherine Morton. 2004. Expressive in speech: analysis and synthesis. New York: Oxford University Press.
Google Scholar
Tseng, Chiu-yu 鄭秋豫. 2010. An F0 analysis of discourse construction and global information in realized narrative prosody 語篇的基頻構組與語流韻律體現. Language and Linguistics 語言暨語言學 11:183-218.
Google Scholar
Tseng, Chiu-yu. 2013. Output prosody—How information highlights are piggybacked by discourse structure. Zhongguo Yuyin Xuebao 中國語音學報 4:109-124.
Google Scholar
Tseng, Chiu-yu, and Chao-yu Su. 2008. Discourse prosody and context—Global F0 and tempo modulations. In Proceedings of Interspeech 2008, 1200-1203. Brisbane, Australia.
Google Scholar
Tseng, Chiu-yu, and Chao-yu Su. 2012. Information allocation and prosodic expressiveness in continuous speech: A Mandarin cross-genre analysis. In Proceedings of the 8th International Symposium on Chinese Spoken Language (ISCSLP 2012), 243-246. Hong Kong.
Google Scholar
Tseng, Chiu-yu, and Chao-yu Su. 2014. Where and how to make an emphasis? —L2 distinct prosody and why. In Proceedings of the 9th International Symposium on Chinese Spoken Language (ISCSLP 2014), 633-637. Singapore.
Google Scholar
Tseng, Chiu-yu, Yun-ching Cheng, Wei-shan Lee, and Feng-lan Huang. 2003. Collecting Mandarin speech databases for prosody investigation. In Proceedings of the Oriental COCOSDA 2003, 225-232. Singapore.
Google Scholar
Tseng, Chiu-yu, Yun-Ching Cheng, and Chun-Hsiang Chang. 2005a. Sinica COSPRO and toolkit—Corpora and platform of Mandarin Chinese fluent speech. In Proceedings of the Oriental COCOSDA 2005, 23-28. Jakarta, Indonesia.
Google Scholar
Tseng, Chiu-yu, Shao-huang Pin, Yeh-lin Lee, Hsin-min Wang, and Yong-cheng Chen. 2005b. Fluent speech prosody: Framework and modeling. Speech Communication 46:284-309.
Google Scholar
Tseng, Chiu-yu, Lin-shan Lee, and Chao-yu Su. 2008. Spontaneous Mandarin speech prosody—the NTU DSP lecture corpus. In Proceedings of the Oriental COCOSDA, 171-174. Kyoto, Japan.
Google Scholar
Tseng, Chiu-yu, Chao-yu Su, and Chi-Feng Huang. 2011. Prosodic highlights in Mandarin continuous speech—Cross-genre attributes and implications. In Proceedings of Interspeech 2011, 1381-1384. Florence, Italy.
Google Scholar
Wichmann, Anne. 2014. Intonation in text and discourse: Beginnings, middles and ends. London: Routledge.
Google Scholar

Download references

Author information

Authors and Affiliations

National Central University, Taoyuan City, Taiwan
Helen Kai-Yun Chen
Institute of Linguistics, Academia Sinica, Taipei, Taiwan
Chiu-Yu Tseng

Authors

Helen Kai-Yun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chiu-Yu Tseng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Helen Kai-Yun Chen .

Editor information

Editors and Affiliations

Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Kowloon, Hong Kong
Chu-Ren Huang
Graduate Institute of Linguistics, National Taiwan University, Taipei, Taiwan
Shu-Kai Hsieh
School of Electronic Information and Artificial Intelligence, Leshan Normal University, Leshan City, Sichuan, China
Peng Jin

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chen, H.KY., Tseng, CY. (2023). Perception Correlated Information Allocation and Pattern Convergence for Discourse Prosody. In: Huang, CR., Hsieh, SK., Jin, P. (eds) Chinese Language Resources. Text, Speech and Language Technology, vol 49. Springer, Cham. https://doi.org/10.1007/978-3-031-38913-9_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-38913-9_24
Published: 19 December 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-38912-2
Online ISBN: 978-3-031-38913-9
eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics