Skip to main content

Intonation Modelling and Adaptation for Emotional Prosody Generation

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3784))

Abstract

This paper proposes an HMM-based approach to generating emotional intonation patterns. A set of models were built to represent syllable-length intonation units. In a classification framework, the models were able to detect a sequence of intonation units from raw fundamental frequency values. Using the models in a generative framework, we were able to synthesize smooth and natural sounding pitch contours. As a case study for emotional intonation generation, Maximum Likelihood Linear Regression (MLLR) adaptation was used to transform the neutral model parameters with a small amount of happy and sad speech data. Perceptual tests showed that listeners could identify the speech with the sad intonation 80% of the time. On the other hand, listeners formed a bimodal distribution in their ability to detect the system generated happy intontation and on average listeners were able to detect happy intonation only 46% of the time.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Akemi, I., Campbell, N., Higuchi, F., Yasamura, M.: A Corpus-based Speech Synthesis System with Emotion. Proc. of Speech Communication 40, 161–187 (2003)

    Article  MATH  Google Scholar 

  2. Montero, J.M., Arriola, J., Colas, J., Enriquez, E., Pardo, J.M.: Analysis and Modelling of Emotional Speech in Spanish. Proc. of ICPhS 2, 957–960 (1999)

    Google Scholar 

  3. Bulut, M., Narayanan, S., Syrdal, A.: Expressive Speech Synthesis Using a Concatenative Synthesizer. In: Proc. of ICSLP (2002)

    Google Scholar 

  4. Schroder, M.: Dimensional Emotion Representation as a Basis for Speech Synthesis with non-extreme emotions. In: Workshop on Affective Dialogue Sys., pp. 209–220 (2004)

    Google Scholar 

  5. Jensen, U., Moore, R.K., Dalsgaard, P., Lindberg, B.: Modelling Intonation Contours at the Phrase Level using Continuous Density Hidden Markov Models. Computer Speech and Language 8, 247–260 (1994)

    Article  Google Scholar 

  6. Ljolje, A., Fallside, F.: Recognition of Isolated Prosodic Patterns using Hidden Markov Models. Speech and Language 2, 27–33 (1987)

    Article  Google Scholar 

  7. Taylor, P.: Anaysis and Synthesis of Intonation using the Tilt Model. Journal of the Acoustical Society of America 107(3), 1697–1714 (2000)

    Article  Google Scholar 

  8. Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., Kitamura, T.: Speech Parameter Generation Algorithms for HMM-Based Speech Synthesis. In: Proc. of ICASSP Processing, vol. 3, pp. 1315–1318 (2000)

    Google Scholar 

  9. Tokuda, K., Zen, H., Black, A.: An HMM-Based Speech Synthesis System Applied To English. In: IEEE Speech Synthesis Workshop (2002)

    Google Scholar 

  10. Odell, J.J.: The Use of Context in Large Vocabulary Speech Recognition. PhD Dissertation, Cambridge University (1995)

    Google Scholar 

  11. http://htk.eng.cam.ac.uk

  12. Ostendorf, M., Price, P.J., Shattuck-Hufnagel, S.: The Boston University Radio Corpus. Technical Report ECS-95-001 (1995)

    Google Scholar 

  13. Gales, M., Woodland, P.: Mean and Variance Adaptation within the MLLR Framework. Computer Speech and Language 10 (1996)

    Google Scholar 

  14. Moulines, E., Charpentier, F.: Pitch Synchronous Waveform Processing Techniques for Text-to-speech Synthesis Using Diphones. Speech Communication 9, 453–467 (1990)

    Article  Google Scholar 

  15. http://www.ldc.upenn.edu/Catalog/LDC2002S28.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Inanoglu, Z., Young, S. (2005). Intonation Modelling and Adaptation for Emotional Prosody Generation. In: Tao, J., Tan, T., Picard, R.W. (eds) Affective Computing and Intelligent Interaction. ACII 2005. Lecture Notes in Computer Science, vol 3784. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573548_37

Download citation

  • DOI: https://doi.org/10.1007/11573548_37

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29621-8

  • Online ISBN: 978-3-540-32273-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics