Abstract
This paper proposes an HMM-based approach to generating emotional intonation patterns. A set of models were built to represent syllable-length intonation units. In a classification framework, the models were able to detect a sequence of intonation units from raw fundamental frequency values. Using the models in a generative framework, we were able to synthesize smooth and natural sounding pitch contours. As a case study for emotional intonation generation, Maximum Likelihood Linear Regression (MLLR) adaptation was used to transform the neutral model parameters with a small amount of happy and sad speech data. Perceptual tests showed that listeners could identify the speech with the sad intonation 80% of the time. On the other hand, listeners formed a bimodal distribution in their ability to detect the system generated happy intontation and on average listeners were able to detect happy intonation only 46% of the time.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Akemi, I., Campbell, N., Higuchi, F., Yasamura, M.: A Corpus-based Speech Synthesis System with Emotion. Proc. of Speech Communication 40, 161–187 (2003)
Montero, J.M., Arriola, J., Colas, J., Enriquez, E., Pardo, J.M.: Analysis and Modelling of Emotional Speech in Spanish. Proc. of ICPhS 2, 957–960 (1999)
Bulut, M., Narayanan, S., Syrdal, A.: Expressive Speech Synthesis Using a Concatenative Synthesizer. In: Proc. of ICSLP (2002)
Schroder, M.: Dimensional Emotion Representation as a Basis for Speech Synthesis with non-extreme emotions. In: Workshop on Affective Dialogue Sys., pp. 209–220 (2004)
Jensen, U., Moore, R.K., Dalsgaard, P., Lindberg, B.: Modelling Intonation Contours at the Phrase Level using Continuous Density Hidden Markov Models. Computer Speech and Language 8, 247–260 (1994)
Ljolje, A., Fallside, F.: Recognition of Isolated Prosodic Patterns using Hidden Markov Models. Speech and Language 2, 27–33 (1987)
Taylor, P.: Anaysis and Synthesis of Intonation using the Tilt Model. Journal of the Acoustical Society of America 107(3), 1697–1714 (2000)
Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., Kitamura, T.: Speech Parameter Generation Algorithms for HMM-Based Speech Synthesis. In: Proc. of ICASSP Processing, vol. 3, pp. 1315–1318 (2000)
Tokuda, K., Zen, H., Black, A.: An HMM-Based Speech Synthesis System Applied To English. In: IEEE Speech Synthesis Workshop (2002)
Odell, J.J.: The Use of Context in Large Vocabulary Speech Recognition. PhD Dissertation, Cambridge University (1995)
Ostendorf, M., Price, P.J., Shattuck-Hufnagel, S.: The Boston University Radio Corpus. Technical Report ECS-95-001 (1995)
Gales, M., Woodland, P.: Mean and Variance Adaptation within the MLLR Framework. Computer Speech and Language 10 (1996)
Moulines, E., Charpentier, F.: Pitch Synchronous Waveform Processing Techniques for Text-to-speech Synthesis Using Diphones. Speech Communication 9, 453–467 (1990)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Inanoglu, Z., Young, S. (2005). Intonation Modelling and Adaptation for Emotional Prosody Generation. In: Tao, J., Tan, T., Picard, R.W. (eds) Affective Computing and Intelligent Interaction. ACII 2005. Lecture Notes in Computer Science, vol 3784. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573548_37
Download citation
DOI: https://doi.org/10.1007/11573548_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29621-8
Online ISBN: 978-3-540-32273-3
eBook Packages: Computer ScienceComputer Science (R0)