Toward a Rule-Based Synthesis of Vietnamese Emotional Speech

Ngo, Thi Duyen; Akagi, Masato; Bui, The Duy

doi:10.1007/978-3-319-11680-8_11

Thi Duyen Ngo⁵,
Masato Akagi⁶ &
The Duy Bui⁵

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 326))

1784 Accesses
1 Citations

Abstract

This paper presents a framework used to simulate four basic emotional styles of Vietnamese speech, by means of acoustic feature transplantation techniques applied to neutral utterances. First, it describes some analyses of acoustic features of Vietnamese emotional speech, accomplished to find the relations between prosodic, voice quality variations and emotional states in Vietnamese speech. Then the target pitch profiles together with duration, energy and spectrum constraints were obtained by applying rules which were inferred from the analysis results and based on the idea that when some emotional speech is synthesized from neutral speech, acoustic features are modified more in some syllables, instead of uniformly modified in all syllables. From there, neutral speech were morphed to produced synthesized speech with emotions. Results of perceptual tests show that emotional styles were well recognized.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Wavesurfer, http://www.speech.kth.se/wavesurfer/index.html
Burkhardt, F.: Emofilt: the simulation of emotional speech by prosody-transformation. In: Proc. of Interspeech (2005)
Google Scholar
Cahn, J.E.: The generation of affect in synthesized speech. Journal of the American Voice I/O Society, 1–19 (1990)
Google Scholar
Edgington, M.: Investigating the limitations of concatenative synthesis. Eurospeech (1997)
Google Scholar
Erickson, D.: Expressive speech: Production, perception and application to speech synthesis. Acoust. Sci. & Tech. 26, 317–325 (2005)
Article Google Scholar
Hanson, H.: Glottal characteristics of female speakers: acoustic correlates. J. Acoust. Soc. Am. 101, 466–481 (1997)
Article Google Scholar
Huttar, G.L.: Relations between prosodic variables and emotions in normal american english utterances. Journal of Speech and Hearing Research 11, 481–487
Google Scholar
Inanoglu, Z., Young, S.: A system for transforming the emotion in speech: Combining data-driven conversion techniques for prosody and voice quality. In: Proc. of Interspeech (2007)
Google Scholar
Ingram, J., Nguyen, T.: Stress, tone and word prosody in vietnamese compounds. In: Proceedings of the 11th Australian International Conference on Speech Science & Technology, pp. 193–198 (2006)
Google Scholar
Ishii, C.T., Campbell, N.: Analysis of acoustic-prosodic features of spontaneous expressive speech. In: Proceedings of 1st International Congress of Phonetics and Phonology, p. 19 (2002)
Google Scholar
Kawahara, H., Masuda-Katsuse, I., de Cheveigne, A.: Restructuring speech representations using a pitch adaptive time-frequency smoothing and an instantaneous-frequency-based f0 extraction: possible role of a repetitive structure in sounds. Speech Communication 27, 187–207 (1999)
Article Google Scholar
Kent, R.D., Read, C.: Acoustic Analysis of Speech. Singular Publishing Group, San Diego (1992)
Google Scholar
Le, H.M., Le, K.H.: Analysis and synthesis for duration feature of vietnamese. In: The 6th National Conference in Information Technology, Thainguyen, Vietnam (2003)
Google Scholar
Le, H.M., Quach, T.N.: Some results in phonetic analysis to vietnamese text-to-speech synthesis based on rules. Journal on Information and Communication Technology (2006)
Google Scholar
Lê, T.-H., Nguyen, A.-V., Truong, H.V., Van Bui, H., Lê, D.: A study on vietnamese prosody. In: Nguyen, N.T., Trawiński, B., Jung, J.J. (eds.) New Challenges for Intelligent Information and Database Systems. SCI, vol. 351, pp. 63–73. Springer, Heidelberg (2011)
Chapter Google Scholar
Leinonen, L.: Expression of emotional-motivational connotations with a one-word utterance. J. Acoust. Soc. Am. 102, 1853–1863 (1997)
Article Google Scholar
Mac, D.K., Castelli, E., Aubergé, V., Rilliard, A.: How vietnamese attitudes can be recognized and confused: Cross-cultural perception and speech prosody analysis. In: International Conference on Asian Language Processing, pp. 220–223 (2011)
Google Scholar
Maekawa, K.: Phonetic and phonological characteristics of paralinguistic information in spoken japanese. In: Proc. Int. Conf. Spoken Language Processing, pp. 635–638 (1998)
Google Scholar
Menezes, C., Maekawa, K., Kawahara, H.: Perception of voice quality in pralinguistic information types: A preliminary study. In: Proceedings of the 20th General Meeting of the PSJ, pp. 153–158 (2006)
Google Scholar
Pell, M.D.: Influence of emotion and focus location on prosody in matched statements and questions. J. Acoust. Soc. Am. 109, 1668–1680 (2001)
Article Google Scholar
Goto, M., Unoku, M., Saitou, T., Akagi, M.: Speech-to-singing synthesis: converting speaking voices to singing voices by controlling acoustic features unique to singing voices. In: Proc. WASPAA 2007 (2007)
Google Scholar
Wallbott, R., Scherer, H.G., Banse, K.R., Goldbeck, T.: Vocal cues in emotion encoding and decoding. Motivation and Emotion 15, 123–148 (1991)
Article Google Scholar
Wallbott, R., Scherer, H.G., Banse, K.R., Goldbeck, T.: Vocal communication of emotion: a review of research paradigms. Speech Communication 40, 227–256 (2003)
Article Google Scholar
Stallo, J.: Simulating emotional speech for a talking head. Honours Thesis, School of Computing, Curtin University of Technology, Australia (2000)
Google Scholar
Tao, J., Kang, Y., Li, A.: Prosody conversion from neutral speech to emotional speech. IEEE Trans. on Audio, Speech and Language Processing 14(2006), 1–19 (2007)
Google Scholar
Tran, D.D., Castelli, E., Serignat, J.-F., Le, V.B.: Analysis and modeling of syllable duration for vietnamese speech synthesis. O-COCOSDA (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Engineering and Technology, Vietnam National University, Hanoi, Vietnam
Thi Duyen Ngo & The Duy Bui
School of Information Science, Japan Advanced Institute of Science and Technology, Nomi, Japan
Masato Akagi

Authors

Thi Duyen Ngo
View author publications
You can also search for this author in PubMed Google Scholar
Masato Akagi
View author publications
You can also search for this author in PubMed Google Scholar
The Duy Bui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thi Duyen Ngo .

Editor information

Editors and Affiliations

Faculty of Information Technology, VNU University of Engineering and Technology, Hanoi, Vietnam
Viet-Ha Nguyen
Faculty of Information Technology, VNU University of Engineering and Technology, Hanoi, Vietnam
Anh-Cuong Le
School of Knowledge Science, Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Van-Nam Huynh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ngo, T.D., Akagi, M., Bui, T.D. (2015). Toward a Rule-Based Synthesis of Vietnamese Emotional Speech. In: Nguyen, VH., Le, AC., Huynh, VN. (eds) Knowledge and Systems Engineering. Advances in Intelligent Systems and Computing, vol 326. Springer, Cham. https://doi.org/10.1007/978-3-319-11680-8_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-11680-8_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11679-2
Online ISBN: 978-3-319-11680-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics