A Study on a Spoken Dialogue System with Cooperative Emotional Speech Synthesis Using Acoustic and Linguistic Information

Yamanaka, Mai; Chiba, Yuya; Nose, Takashi; Ito, Akinori

doi:10.1007/978-3-030-03748-2_12

Mai Yamanaka⁷,
Yuya Chiba⁷,
Takashi Nose⁷ &
…
Akinori Ito⁷

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 110))

Included in the following conference series:

International Conference on Intelligent Information Hiding and Multimedia Signal Processing

555 Accesses
1 Citations

Abstract

This study examines an emotion labeling method for a system utterance of a non-task-oriented spoken dialogue system. The conventional study proposed the cooperative emotion labeling, which generates an emotional speech with an emotion label estimated from user and system utterances. However, this method had a problem that the system cannot decide the emotion label when the emotion is not estimated from the linguistic information. Therefore, we propose a method that uses both the acoustic and the linguistic information for the emotion recognition. In this paper, we show the performance of the emotion recognition when using the acoustic features first. Then, a dialogue experiment based on scenarios is conducted to verify the effectiveness of the proposed emotion labeling method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Acosta, J.C., Ward, N.G.: Achieving rapport with turn-by-turn, user-responsive emotional coloring. Speech Commun. 53(9–10), 1137–1148 (2011)
Article Google Scholar
Chiba, Y., Nose, T., Yamanaka, M., Kase, T., Ito, A.: An analysis of the effect of emotional speech synthesis on non-task-oriented dialogue system. In: Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, pp. 371–375 (2018)
Google Scholar
Kase, T., Nose, T., Chiba, Y., Ito, A.: Method of emotion coloring in chat dialogues considering system and user utterance. In: Reports of the Spring Meeting the Acoustical Society of Japan, pp. 89–92 (2016). (in Japanese)
Google Scholar
Kobayashi, N., Inui, K., Matsumoto, Y., Tateishi, K., Fukushima, T.: Collecting evaluative expressions for opinion extraction. In: Proceedings of the International Conference on Natural Language Processing, pp. 596–605 (2004)
Chapter Google Scholar
Lee, A., Oura, K., Tokuda, K.: MMDAgent-A fully open-source toolkit for voice interaction systems. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8382–8385 (2013)
Google Scholar
Meguro, T., Higashinaka, R., Minami, Y., Dohsaka, K.: Controlling listening-oriented dialogue using partially observable markov decision processes. In: Proceedings of the 23rd International Conference on Computational Linguistics, COLING 2010, pp. 761–769 (2010)
Google Scholar
Nass, C., et al.: Improving automotive safety by pairing driver emotion and car voice emotion. In: CHI 2005 Extended Abstracts on Human Factors in Computing Systems, pp. 1973–1976 (2005)
Google Scholar
Nose, T., Kobayashi, T.: Recent development of HMM-based expressive speech synthesis and its applications. In: Proc. APSIPA ASC, pp. 1–4 (2011)
Google Scholar
Ritter, A., Cherry, C., Dolan, W.B.: Data-driven response generation in social media. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, pp. 583–593 (2011)
Google Scholar
Schuller, B., Steidl, S., Batliner, A.: The INTERSPEECH 2009 emotion challenge. In: Proceedings of the Tenth Annual Conference of the International Speech Communication Association (2009)
Google Scholar
Sugiyama, H., Meguro, T., Higashinaka, R., Minami, Y.: Open-domain utterance generation for conversational dialogue systems using web-scale dependency structures. In: Proceedings of the SIGDIAL 2013 Conference, pp. 334–338 (2013)
Google Scholar
Takeishi, E., Nose, T., Chiba, Y., Ito, A.: Construction and analysis of phonetically and prosodically balanced emotional speech database. In: Proceedings of 2016 Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques, pp. 16–21 (2016)
Google Scholar

Download references

Acknowledgments

Part of this work was supported by JSPS KAKENHI Grant Number JP17H00823, JP18K18136.

Author information

Authors and Affiliations

Graduate School of Engineering, Tohoku University, Aramaki Aza Aoba 6–6–05, Aoba-ku, Sendai-shi, Miyagi, 980–8579, Japan
Mai Yamanaka, Yuya Chiba, Takashi Nose & Akinori Ito

Authors

Mai Yamanaka
View author publications
You can also search for this author in PubMed Google Scholar
Yuya Chiba
View author publications
You can also search for this author in PubMed Google Scholar
Takashi Nose
View author publications
You can also search for this author in PubMed Google Scholar
Akinori Ito
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Akinori Ito .

Editor information

Editors and Affiliations

College of Information Science and Engineering, Fujian University of Technology, Fuzhou, Fujian, China
Jeng-Shyang Pan
Graduate School of Engineering, Tohoku University, Sendai, Miyagi, Japan
Akinori Ito
Swinburne University of Technology, Hawthorn, VIC, Australia
Pei-Wei Tsai
Centre for Artificial Intelligence, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yamanaka, M., Chiba, Y., Nose, T., Ito, A. (2019). A Study on a Spoken Dialogue System with Cooperative Emotional Speech Synthesis Using Acoustic and Linguistic Information. In: Pan, JS., Ito, A., Tsai, PW., Jain, L. (eds) Recent Advances in Intelligent Information Hiding and Multimedia Signal Processing. IIH-MSP 2018. Smart Innovation, Systems and Technologies, vol 110. Springer, Cham. https://doi.org/10.1007/978-3-030-03748-2_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-03748-2_12
Published: 11 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03747-5
Online ISBN: 978-3-030-03748-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics