Expressive Speech Synthesis Using Emotion-Specific Speech Inventories

  • Csaba Zainkó
  • Márk Fék
  • Géza Németh
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5042)


In this paper we explore the use of emotion-specific speech inventories for expressive speech synthesis. We recorded a semantically neutral sentence and 26 logatoms containing all the diphones and CVC triphones necessary to synthesize the same sentence. The speech material was produced by a professional actress expressing all logatoms and the sentence with the six basic emotions and in neutral tone. 7 emotion-dependent inventories were constructed from the logatoms. The 7 inventories paired with the prosody extracted from the 7 natural sentences were used to synthesize 49 sentences. 194 listeners evaluated the emotions expressed in the logatoms and in the natural and synthetic sentences. The intended emotion was recognized above chance level for 99% of the logatoms and for all natural sentences. Recognition rates significantly above chance level were obtained for each emotion. The recognition rate for some synthetic sentences exceeded that of natural ones.


Expressive speech synthesis basic emotions diphone and triphone inventory listening test forced choice 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ladd, D.R., Silverman, K., Tolkmitt, F., Bergmann, G., Scherer, K.R.: Evidence for the independent function of intonation contour type, voice quality, and f0 range in signalling speaker affect. Journal of the Acoustic Society of America 78(2), 435–444 (1985)CrossRefGoogle Scholar
  2. 2.
    Inanoglu, Z., Young, S.: A system for Transforming the Emotion in Speech: Combining Data-Driven Conversion Techniques for Prosody and Voice Quality. In: Interspeech (2007)Google Scholar
  3. 3.
    Montero, J.M., Arriola, G.J., Colas, J., Enriquez, E., Pardo, J.M.: Analysis and Modeling of Emotional Speech in Spanish. In: Proc. of ICPhS, pp. 957–960 (1999)Google Scholar
  4. 4.
    Bulut, M., Narayanan, S.S., Syrdal, A.K.: Expressive Speech Synthesis Using a Concatenative Synthesizer. In: ICSLP-2002, pp. 1265–1268 (2002)Google Scholar
  5. 5.
    Schröder, M., Grice, M.: Expressing Vocal Effort in Concatenative Synthesis. In: Proc. of ICPhS, Barcelona, Spain, pp. 2589–2592 (2003)Google Scholar
  6. 6.
    Boersma, P.: Praat, a system for doing phonetics by computer. Glot International 5(9/10), 341–345 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Csaba Zainkó
    • 1
  • Márk Fék
    • 1
  • Géza Németh
    • 1
  1. 1.Department of Telecommunications and Media InformaticsBudapest University of Technology and EconomicsBudapestHungary

Personalised recommendations