On Appropriateness and Estimation of the Emotion of Synthesized Response Speech in a Spoken Dialogue System

  • Taketo Kase
  • Takashi Nose
  • Akinori ItoEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 528)


Paralinguistic features such as emotion of an utterance is as important as its linguistic content for generating better response utterances in spoken dialog systems. In this research, we carried out an experiment to reveal the effect of emotional speech synthesis in a dialogue system, and investigated what method was effective for giving emotion to the synthetic speech. Firstly, we carried out an experiment where an agent with various emotional speech talked to the user, and the appropriateness of the emotion was evaluated. As expected, users had better impression on the agent when we added emotion appropriately. Next, we examined methods of automatic estimation of emotion for the system’s response, and we found that the best method was to give the same emotion as the user’s previous utterance regardless of the content of the system’s utterance.


Spoken dialog system Emotional speech synthesis Response generation 


  1. 1.
    Higuchi, S., Rafal, R., Araki, K.: A casual conversation system using modality and word associations retrieved from the web. In: Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 382–390. Honolulu (2008)Google Scholar
  2. 2.
    Meguro, T., Higashinaka, R., Minami, Y., Dohsaka, K.: Controlling listening-oriented dialogue using partially observable Markov decision processes. In: Proceedings of COLING, pp. 761–769 (2010)Google Scholar
  3. 3.
    Burkhardt, F., Stegmann, J.: Emotional speech synthesis: applications, history and possible future. In: Proceedings of Electronic Speech Signal Processing Conference (ESSV) (2009)Google Scholar
  4. 4.
    Schröder, M.: Emotional speech synthesis: a review. In: Proceedings of Eurospeech, Aalborg, pp. 561–564 (2001)Google Scholar
  5. 5.
    Acosta, J.C., Ward, N.G.: Achieving rapport with turn-by-turn, user-responsive emotional coloring. Speech Commun. 53, 1137–1148 (2011)CrossRefGoogle Scholar
  6. 6.
    Lee, A., Oura, K., Tokuda, K.: MMDAgent: a fully open-source toolkit for voice interaction systems. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2013)Google Scholar
  7. 7.
    Guinn, C., Hubal, R.: Extracting emotional information from the text of spoken dialog. In: Proceedings of 3rd Workshop on Affective and Attitude User Modeling. Pittsburg (2003)Google Scholar
  8. 8.
    Kobayashi, N., Inui, K., Matsumoto, Y., Tateishi, K., Fukushima, T.: Collecting evaluative expressions for opinion extraction. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS (LNAI), vol. 3248, pp. 596–605. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  9. 9.
    Takase, S., Murakami, A., Enoki, M., Okazaki, N., Inui, K.: Detecting chronic critics based on sentiment polarity and user’s behavior in social media. In: Proceedings of ACL (Student Research Workshop), pp. 110–116 (2013)Google Scholar
  10. 10.

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Graduate School of EngineeringTohoku UniversitySendaiJapan

Personalised recommendations