Intonation Modelling and Adaptation for Emotional Prosody Generation

Inanoglu, Zeynep; Young, Steve

doi:10.1007/11573548_37

Intonation Modelling and Adaptation for Emotional Prosody Generation

Zeynep Inanoglu¹⁹ &
Steve Young¹⁹

Conference paper

5032 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3784))

Abstract

This paper proposes an HMM-based approach to generating emotional intonation patterns. A set of models were built to represent syllable-length intonation units. In a classification framework, the models were able to detect a sequence of intonation units from raw fundamental frequency values. Using the models in a generative framework, we were able to synthesize smooth and natural sounding pitch contours. As a case study for emotional intonation generation, Maximum Likelihood Linear Regression (MLLR) adaptation was used to transform the neutral model parameters with a small amount of happy and sad speech data. Perceptual tests showed that listeners could identify the speech with the sad intonation 80% of the time. On the other hand, listeners formed a bimodal distribution in their ability to detect the system generated happy intontation and on average listeners were able to detect happy intonation only 46% of the time.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Akemi, I., Campbell, N., Higuchi, F., Yasamura, M.: A Corpus-based Speech Synthesis System with Emotion. Proc. of Speech Communication 40, 161–187 (2003)
Article MATH Google Scholar
Montero, J.M., Arriola, J., Colas, J., Enriquez, E., Pardo, J.M.: Analysis and Modelling of Emotional Speech in Spanish. Proc. of ICPhS 2, 957–960 (1999)
Google Scholar
Bulut, M., Narayanan, S., Syrdal, A.: Expressive Speech Synthesis Using a Concatenative Synthesizer. In: Proc. of ICSLP (2002)
Google Scholar
Schroder, M.: Dimensional Emotion Representation as a Basis for Speech Synthesis with non-extreme emotions. In: Workshop on Affective Dialogue Sys., pp. 209–220 (2004)
Google Scholar
Jensen, U., Moore, R.K., Dalsgaard, P., Lindberg, B.: Modelling Intonation Contours at the Phrase Level using Continuous Density Hidden Markov Models. Computer Speech and Language 8, 247–260 (1994)
Article Google Scholar
Ljolje, A., Fallside, F.: Recognition of Isolated Prosodic Patterns using Hidden Markov Models. Speech and Language 2, 27–33 (1987)
Article Google Scholar
Taylor, P.: Anaysis and Synthesis of Intonation using the Tilt Model. Journal of the Acoustical Society of America 107(3), 1697–1714 (2000)
Article Google Scholar
Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., Kitamura, T.: Speech Parameter Generation Algorithms for HMM-Based Speech Synthesis. In: Proc. of ICASSP Processing, vol. 3, pp. 1315–1318 (2000)
Google Scholar
Tokuda, K., Zen, H., Black, A.: An HMM-Based Speech Synthesis System Applied To English. In: IEEE Speech Synthesis Workshop (2002)
Google Scholar
Odell, J.J.: The Use of Context in Large Vocabulary Speech Recognition. PhD Dissertation, Cambridge University (1995)
Google Scholar
http://htk.eng.cam.ac.uk
Ostendorf, M., Price, P.J., Shattuck-Hufnagel, S.: The Boston University Radio Corpus. Technical Report ECS-95-001 (1995)
Google Scholar
Gales, M., Woodland, P.: Mean and Variance Adaptation within the MLLR Framework. Computer Speech and Language 10 (1996)
Google Scholar
Moulines, E., Charpentier, F.: Pitch Synchronous Waveform Processing Techniques for Text-to-speech Synthesis Using Diphones. Speech Communication 9, 453–467 (1990)
Article Google Scholar
http://www.ldc.upenn.edu/Catalog/LDC2002S28.html

Download references

Author information

Authors and Affiliations

Cambridge University Engineering Department, Machine Intelligence Laboratory, Cambridge, UK
Zeynep Inanoglu & Steve Young

Authors

Zeynep Inanoglu
View author publications
You can also search for this author in PubMed Google Scholar
Steve Young
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences,
Jianhua Tao
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
MIT Media Laboratory, 20 Ames Street, 02139, Cambridge, MA, USA
Rosalind W. Picard

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Inanoglu, Z., Young, S. (2005). Intonation Modelling and Adaptation for Emotional Prosody Generation. In: Tao, J., Tan, T., Picard, R.W. (eds) Affective Computing and Intelligent Interaction. ACII 2005. Lecture Notes in Computer Science, vol 3784. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573548_37

Download citation

DOI: https://doi.org/10.1007/11573548_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29621-8
Online ISBN: 978-3-540-32273-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics