Skip to main content

A Study on a Spoken Dialogue System with Cooperative Emotional Speech Synthesis Using Acoustic and Linguistic Information

  • Conference paper
  • First Online:
Recent Advances in Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2018)

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 110))

Abstract

This study examines an emotion labeling method for a system utterance of a non-task-oriented spoken dialogue system. The conventional study proposed the cooperative emotion labeling, which generates an emotional speech with an emotion label estimated from user and system utterances. However, this method had a problem that the system cannot decide the emotion label when the emotion is not estimated from the linguistic information. Therefore, we propose a method that uses both the acoustic and the linguistic information for the emotion recognition. In this paper, we show the performance of the emotion recognition when using the acoustic features first. Then, a dialogue experiment based on scenarios is conducted to verify the effectiveness of the proposed emotion labeling method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Acosta, J.C., Ward, N.G.: Achieving rapport with turn-by-turn, user-responsive emotional coloring. Speech Commun. 53(9–10), 1137–1148 (2011)

    Article  Google Scholar 

  2. Chiba, Y., Nose, T., Yamanaka, M., Kase, T., Ito, A.: An analysis of the effect of emotional speech synthesis on non-task-oriented dialogue system. In: Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, pp. 371–375 (2018)

    Google Scholar 

  3. Kase, T., Nose, T., Chiba, Y., Ito, A.: Method of emotion coloring in chat dialogues considering system and user utterance. In: Reports of the Spring Meeting the Acoustical Society of Japan, pp. 89–92 (2016). (in Japanese)

    Google Scholar 

  4. Kobayashi, N., Inui, K., Matsumoto, Y., Tateishi, K., Fukushima, T.: Collecting evaluative expressions for opinion extraction. In: Proceedings of the International Conference on Natural Language Processing, pp. 596–605 (2004)

    Chapter  Google Scholar 

  5. Lee, A., Oura, K., Tokuda, K.: MMDAgent-A fully open-source toolkit for voice interaction systems. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8382–8385 (2013)

    Google Scholar 

  6. Meguro, T., Higashinaka, R., Minami, Y., Dohsaka, K.: Controlling listening-oriented dialogue using partially observable markov decision processes. In: Proceedings of the 23rd International Conference on Computational Linguistics, COLING 2010, pp. 761–769 (2010)

    Google Scholar 

  7. Nass, C., et al.: Improving automotive safety by pairing driver emotion and car voice emotion. In: CHI 2005 Extended Abstracts on Human Factors in Computing Systems, pp. 1973–1976 (2005)

    Google Scholar 

  8. Nose, T., Kobayashi, T.: Recent development of HMM-based expressive speech synthesis and its applications. In: Proc. APSIPA ASC, pp. 1–4 (2011)

    Google Scholar 

  9. Ritter, A., Cherry, C., Dolan, W.B.: Data-driven response generation in social media. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, pp. 583–593 (2011)

    Google Scholar 

  10. Schuller, B., Steidl, S., Batliner, A.: The INTERSPEECH 2009 emotion challenge. In: Proceedings of the Tenth Annual Conference of the International Speech Communication Association (2009)

    Google Scholar 

  11. Sugiyama, H., Meguro, T., Higashinaka, R., Minami, Y.: Open-domain utterance generation for conversational dialogue systems using web-scale dependency structures. In: Proceedings of the SIGDIAL 2013 Conference, pp. 334–338 (2013)

    Google Scholar 

  12. Takeishi, E., Nose, T., Chiba, Y., Ito, A.: Construction and analysis of phonetically and prosodically balanced emotional speech database. In: Proceedings of 2016 Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques, pp. 16–21 (2016)

    Google Scholar 

Download references

Acknowledgments

Part of this work was supported by JSPS KAKENHI Grant Number JP17H00823, JP18K18136.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Akinori Ito .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yamanaka, M., Chiba, Y., Nose, T., Ito, A. (2019). A Study on a Spoken Dialogue System with Cooperative Emotional Speech Synthesis Using Acoustic and Linguistic Information. In: Pan, JS., Ito, A., Tsai, PW., Jain, L. (eds) Recent Advances in Intelligent Information Hiding and Multimedia Signal Processing. IIH-MSP 2018. Smart Innovation, Systems and Technologies, vol 110. Springer, Cham. https://doi.org/10.1007/978-3-030-03748-2_12

Download citation

Publish with us

Policies and ethics