Abstract
This paper presents a framework used to simulate four basic emotional styles of Vietnamese speech, by means of acoustic feature transplantation techniques applied to neutral utterances. First, it describes some analyses of acoustic features of Vietnamese emotional speech, accomplished to find the relations between prosodic, voice quality variations and emotional states in Vietnamese speech. Then the target pitch profiles together with duration, energy and spectrum constraints were obtained by applying rules which were inferred from the analysis results and based on the idea that when some emotional speech is synthesized from neutral speech, acoustic features are modified more in some syllables, instead of uniformly modified in all syllables. From there, neutral speech were morphed to produced synthesized speech with emotions. Results of perceptual tests show that emotional styles were well recognized.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wavesurfer, http://www.speech.kth.se/wavesurfer/index.html
Burkhardt, F.: Emofilt: the simulation of emotional speech by prosody-transformation. In: Proc. of Interspeech (2005)
Cahn, J.E.: The generation of affect in synthesized speech. Journal of the American Voice I/O Society, 1–19 (1990)
Edgington, M.: Investigating the limitations of concatenative synthesis. Eurospeech (1997)
Erickson, D.: Expressive speech: Production, perception and application to speech synthesis. Acoust. Sci. & Tech. 26, 317–325 (2005)
Hanson, H.: Glottal characteristics of female speakers: acoustic correlates. J. Acoust. Soc. Am. 101, 466–481 (1997)
Huttar, G.L.: Relations between prosodic variables and emotions in normal american english utterances. Journal of Speech and Hearing Research 11, 481–487
Inanoglu, Z., Young, S.: A system for transforming the emotion in speech: Combining data-driven conversion techniques for prosody and voice quality. In: Proc. of Interspeech (2007)
Ingram, J., Nguyen, T.: Stress, tone and word prosody in vietnamese compounds. In: Proceedings of the 11th Australian International Conference on Speech Science & Technology, pp. 193–198 (2006)
Ishii, C.T., Campbell, N.: Analysis of acoustic-prosodic features of spontaneous expressive speech. In: Proceedings of 1st International Congress of Phonetics and Phonology, p. 19 (2002)
Kawahara, H., Masuda-Katsuse, I., de Cheveigne, A.: Restructuring speech representations using a pitch adaptive time-frequency smoothing and an instantaneous-frequency-based f0 extraction: possible role of a repetitive structure in sounds. Speech Communication 27, 187–207 (1999)
Kent, R.D., Read, C.: Acoustic Analysis of Speech. Singular Publishing Group, San Diego (1992)
Le, H.M., Le, K.H.: Analysis and synthesis for duration feature of vietnamese. In: The 6th National Conference in Information Technology, Thainguyen, Vietnam (2003)
Le, H.M., Quach, T.N.: Some results in phonetic analysis to vietnamese text-to-speech synthesis based on rules. Journal on Information and Communication Technology (2006)
Lê, T.-H., Nguyen, A.-V., Truong, H.V., Van Bui, H., Lê, D.: A study on vietnamese prosody. In: Nguyen, N.T., Trawiński, B., Jung, J.J. (eds.) New Challenges for Intelligent Information and Database Systems. SCI, vol. 351, pp. 63–73. Springer, Heidelberg (2011)
Leinonen, L.: Expression of emotional-motivational connotations with a one-word utterance. J. Acoust. Soc. Am. 102, 1853–1863 (1997)
Mac, D.K., Castelli, E., Aubergé, V., Rilliard, A.: How vietnamese attitudes can be recognized and confused: Cross-cultural perception and speech prosody analysis. In: International Conference on Asian Language Processing, pp. 220–223 (2011)
Maekawa, K.: Phonetic and phonological characteristics of paralinguistic information in spoken japanese. In: Proc. Int. Conf. Spoken Language Processing, pp. 635–638 (1998)
Menezes, C., Maekawa, K., Kawahara, H.: Perception of voice quality in pralinguistic information types: A preliminary study. In: Proceedings of the 20th General Meeting of the PSJ, pp. 153–158 (2006)
Pell, M.D.: Influence of emotion and focus location on prosody in matched statements and questions. J. Acoust. Soc. Am. 109, 1668–1680 (2001)
Goto, M., Unoku, M., Saitou, T., Akagi, M.: Speech-to-singing synthesis: converting speaking voices to singing voices by controlling acoustic features unique to singing voices. In: Proc. WASPAA 2007 (2007)
Wallbott, R., Scherer, H.G., Banse, K.R., Goldbeck, T.: Vocal cues in emotion encoding and decoding. Motivation and Emotion 15, 123–148 (1991)
Wallbott, R., Scherer, H.G., Banse, K.R., Goldbeck, T.: Vocal communication of emotion: a review of research paradigms. Speech Communication 40, 227–256 (2003)
Stallo, J.: Simulating emotional speech for a talking head. Honours Thesis, School of Computing, Curtin University of Technology, Australia (2000)
Tao, J., Kang, Y., Li, A.: Prosody conversion from neutral speech to emotional speech. IEEE Trans. on Audio, Speech and Language Processing 14(2006), 1–19 (2007)
Tran, D.D., Castelli, E., Serignat, J.-F., Le, V.B.: Analysis and modeling of syllable duration for vietnamese speech synthesis. O-COCOSDA (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Ngo, T.D., Akagi, M., Bui, T.D. (2015). Toward a Rule-Based Synthesis of Vietnamese Emotional Speech. In: Nguyen, VH., Le, AC., Huynh, VN. (eds) Knowledge and Systems Engineering. Advances in Intelligent Systems and Computing, vol 326. Springer, Cham. https://doi.org/10.1007/978-3-319-11680-8_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-11680-8_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11679-2
Online ISBN: 978-3-319-11680-8
eBook Packages: EngineeringEngineering (R0)