A Study on a Spoken Dialogue System with Cooperative Emotional Speech Synthesis Using Acoustic and Linguistic Information
This study examines an emotion labeling method for a system utterance of a non-task-oriented spoken dialogue system. The conventional study proposed the cooperative emotion labeling, which generates an emotional speech with an emotion label estimated from user and system utterances. However, this method had a problem that the system cannot decide the emotion label when the emotion is not estimated from the linguistic information. Therefore, we propose a method that uses both the acoustic and the linguistic information for the emotion recognition. In this paper, we show the performance of the emotion recognition when using the acoustic features first. Then, a dialogue experiment based on scenarios is conducted to verify the effectiveness of the proposed emotion labeling method.
Part of this work was supported by JSPS KAKENHI Grant Number JP17H00823, JP18K18136.
- 2.Chiba, Y., Nose, T., Yamanaka, M., Kase, T., Ito, A.: An analysis of the effect of emotional speech synthesis on non-task-oriented dialogue system. In: Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, pp. 371–375 (2018)Google Scholar
- 3.Kase, T., Nose, T., Chiba, Y., Ito, A.: Method of emotion coloring in chat dialogues considering system and user utterance. In: Reports of the Spring Meeting the Acoustical Society of Japan, pp. 89–92 (2016). (in Japanese)Google Scholar
- 5.Lee, A., Oura, K., Tokuda, K.: MMDAgent-A fully open-source toolkit for voice interaction systems. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8382–8385 (2013)Google Scholar
- 6.Meguro, T., Higashinaka, R., Minami, Y., Dohsaka, K.: Controlling listening-oriented dialogue using partially observable markov decision processes. In: Proceedings of the 23rd International Conference on Computational Linguistics, COLING 2010, pp. 761–769 (2010)Google Scholar
- 7.Nass, C., et al.: Improving automotive safety by pairing driver emotion and car voice emotion. In: CHI 2005 Extended Abstracts on Human Factors in Computing Systems, pp. 1973–1976 (2005)Google Scholar
- 8.Nose, T., Kobayashi, T.: Recent development of HMM-based expressive speech synthesis and its applications. In: Proc. APSIPA ASC, pp. 1–4 (2011)Google Scholar
- 9.Ritter, A., Cherry, C., Dolan, W.B.: Data-driven response generation in social media. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, pp. 583–593 (2011)Google Scholar
- 10.Schuller, B., Steidl, S., Batliner, A.: The INTERSPEECH 2009 emotion challenge. In: Proceedings of the Tenth Annual Conference of the International Speech Communication Association (2009)Google Scholar
- 11.Sugiyama, H., Meguro, T., Higashinaka, R., Minami, Y.: Open-domain utterance generation for conversational dialogue systems using web-scale dependency structures. In: Proceedings of the SIGDIAL 2013 Conference, pp. 334–338 (2013)Google Scholar
- 12.Takeishi, E., Nose, T., Chiba, Y., Ito, A.: Construction and analysis of phonetically and prosodically balanced emotional speech database. In: Proceedings of 2016 Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques, pp. 16–21 (2016)Google Scholar