User Generated Dialogue Systems: uDialogue

Tokuda, Keiichi; Lee, Akinobu; Nankaku, Yoshihiko; Oura, Keiichiro; Hashimoto, Kei; Yamamoto, Daisuke; Takumi, Ichi; Uchiya, Takahiro; Tsutsumi, Shuhei; Renals, Steve; Yamagishi, Junichi

doi:10.1007/978-4-431-56535-2_3

User Generated Dialogue Systems: uDialogue

Keiichi Tokuda²,
Akinobu Lee²,
Yoshihiko Nankaku²,
Keiichiro Oura²,
Kei Hashimoto²,
Daisuke Yamamoto²,
Ichi Takumi²,
Takahiro Uchiya²,
Shuhei Tsutsumi²,
Steve Renals³ &
…
Junichi Yamagishi⁴

Chapter
First Online: 21 April 2017

392 Accesses
1 Citations
1 Altmetric

Abstract

This chapter introduces the idea of user-generated dialogue content and describes our experimental exploration aimed at clarifying the mechanism and conditions that makes it workable in practice. One of the attractive points of a speech interface is to provide a vivid sense of interactivity that cannot be achieved with a text interface alone. This study proposes a framework that spoken dialogue systems are separated into content that can be produced and modified by users, and the systems that drive the content, and seek to clarify (1) the requirements of systems that enable the creation of attractive spoken dialogue, and (2) the conditions for the active generation of attractive dialogue content by users, while attempting to establish a method for realizing them. Experiments for validating user dialogue content generation were performed by installing interactive digital signage with a speech interface in public spaces as a dialogue device, and implementing a content generation environment for users via the Internet. The proposed framework is expected to lead to a breakthrough in the spread of using speech technology.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The number of downloads is the cumulative total from October 2011 to March 2016.

References

HTS: HMM speech synthesis toolkit, http://www.hts.nitech.ac.jp/
Open JTalk: Japanese text-to-speech system, http://open-jtalk.sourceforge.net/
Julius: Open-source large vocabulary continuous speech recognition engine, http://julius.sourceforge.jp/
MMDAgent: Toolkit for building voice interaction systems, http://www.mmdagent.jp/
T. Funayachi, K. Oura, Y. Nankaku, A. Lee, K. Tokuda, A simple dialogue description based on finite state transducers for user-generated spoken dialog content, in Proceedings of ASJ 2013 Autumn Meeting, 2-P-28, pp. 223–224, 25–27 Sept 2013. (in Japanese)
Google Scholar
K. Nakamura, K. Hashimoto, Y. Nankaku, K. Tokuda, Integration of spectral feature extraction and modeling for HMM-based speech synthesis. IEICE Trans. Inf. Syst. E97-D(6), 1438–1448 (2014)
Google Scholar
S. Takaki, Y. Nankaku, K. Tokuda, Contextual partial additive structure for HMM-based speech synthesis, in 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver, Canada, pp. 7878–7882, 2013
Google Scholar
R. Dall, M. Tomalin, M. Wester, W. Byrne, S. King, Investigating automatic & human filled pause insertion for speech synthesis, in Proceedings of Interspeech, 2014
Google Scholar
S. R. Gangireddy, S. Renals, Y. Nankaku, A. Lee, Prosodically-enhanced recurrent neural network language models, in Proceedings of Interspeech 2015, Dresden, Sept 2015
Google Scholar
K. Hashimoto, K. Oura, Y. Nankaku, K. Tokuda, The effect of neural networks in statistical parametric speech synthesis, in Proceedings of 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2015), Brisbane, Australia, pp. 4455–4459, 19–24 Apr 2015
Google Scholar
S. Takaki, S. Kim, J. Yamagishi, J.J. Kim, Multiple feed-forward deep neural networks for statistical parametric speech synthesis, in Proceedings of Interspeech, vol. 2015, pp. 2242–2246, 2015
Google Scholar
K. Hashimoto, K. Oura, Y. Nankaku, K. Tokuda, Trajectory training considering global variance for speech synthesis based on neural networks, in Proceedings of 2016 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2016), Shanghai, China, pp. 5600–5604, 20–25 Mar 2016
Google Scholar
K. Sawada, K. Hashimoto, K. Oura, Y. Nankaku, and K. Tokuda, “Evaluation of text-to-speech system construction for unknown-pronunciation languages,” Technical Report of IEICE, vol. 115, no. 346, SP2015-80, pp. 93–98, 2–3 Dec 2015
Google Scholar
S.R. Gangireddy, Q. Huang, S. Renals, F. McInnes, J. Yamagishi, in Topic Model Features in Neural Network Language Models, (UK Speech Meeting, 2013)
Google Scholar
CSTR VCTK Corpus, http://www.udialogue.org/ja/download-ja.html
D. Yamamoto, K. Oura, R. Nishimura, T. Uchiya, A. Lee, I. Takumi, Keiichi Tokuda, Voice interaction system with 3D-CG human agent for Stand-alone smartphones, in Proceedings of the 2nd International Conference on Human Agent Interaction (ACM digital library, 2014), pp. 320–330
Google Scholar
K. Wakabayashi, D. Yamamoto, N. Takahashi, A voice dialog editor based on finite state transducer using composite state for tablet devices, computer and information science 2015. Stud. Comput. Intell. 614, 125–139 (2016)
Google Scholar
R. Nishimura, D. Yamamoto, T. Uchiya, I. Takumi, Development of a dialogue scenario editor on a web browser for a spoken dialogue system, in Proceedings of the Second International Conference on Human-agent Interaction, pp. 129–132, 2014
Google Scholar
Y. Matsushita, T. Uchiya, R. Nishimura, D. Yamamoto, I. Takumi, Crowdsourcing environment to create voice interaction scenario of spoken dialogue system, in Proceedings of the 18th International Conference on Network-Based Information Systems (NBiS-2015), pp. 500–504, 2015
Google Scholar
Y. Matsushita, T. Uchiya, R. Nishimura, D. Yamamoto, I. Takumi, Experiment and evaluation of crowd sourcing model for creation of voice interaction scenario. Proc. IEEE GCCE 2015, 321–322 (2015)
Google Scholar
T. Uchiya, R. Nakano, D. Yamamoto, R. Nishimura, I. Takumi, Extension with intelligent agents for the spoken dialogue system for smartphones. Proc. IEEE GCCE 2015, 298–299 (2015)
Google Scholar
T. Uchiya, D. Yamamoto, M. Shibakawa, M. Yoshida, R. Nishimura, I. Takumi, Development of spoken dialogue service based on video call named “Mobile Meichan”. Proc. JAWS2012 (2012). (in Japanese)
Google Scholar
T. Uchiya, M. Yoshida, D. Yamamoto, R. Nishimura, I. Takumi, Design and implementation of open-campus event system with voice interaction agent. Int. J. Mob. Multimed. 11(3, 4), 237–250 (2015)
Google Scholar
R. Nishimura, K. Sugioka, D. Yamamoto, T. Uchiya, I. Takumi, A VoIP-based voice interaction system for a virtual telephone operator using video calls. Proc. IEEE GCCE 2014, 529–532 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Nagoya Institute of Technology, Nagoya, Japan
Keiichi Tokuda, Akinobu Lee, Yoshihiko Nankaku, Keiichiro Oura, Kei Hashimoto, Daisuke Yamamoto, Ichi Takumi, Takahiro Uchiya & Shuhei Tsutsumi
University of Edinburgh, Edinburgh, UK
Steve Renals
National Institute of Informatics, Tokyo, Japan
Junichi Yamagishi

Authors

Keiichi Tokuda
View author publications
You can also search for this author in PubMed Google Scholar
Akinobu Lee
View author publications
You can also search for this author in PubMed Google Scholar
Yoshihiko Nankaku
View author publications
You can also search for this author in PubMed Google Scholar
Keiichiro Oura
View author publications
You can also search for this author in PubMed Google Scholar
Kei Hashimoto
View author publications
You can also search for this author in PubMed Google Scholar
Daisuke Yamamoto
View author publications
You can also search for this author in PubMed Google Scholar
Ichi Takumi
View author publications
You can also search for this author in PubMed Google Scholar
Takahiro Uchiya
View author publications
You can also search for this author in PubMed Google Scholar
Shuhei Tsutsumi
View author publications
You can also search for this author in PubMed Google Scholar
Steve Renals
View author publications
You can also search for this author in PubMed Google Scholar
Junichi Yamagishi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Keiichi Tokuda .

Editor information

Editors and Affiliations

Graduate School of Informatics, Kyoto University, Kyoto, Japan
Toyoaki Nishida

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tokuda, K. et al. (2017). User Generated Dialogue Systems: uDialogue. In: Nishida, T. (eds) Human-Harmonized Information Technology, Volume 2. Springer, Tokyo. https://doi.org/10.1007/978-4-431-56535-2_3

Download citation

DOI: https://doi.org/10.1007/978-4-431-56535-2_3
Published: 21 April 2017
Publisher Name: Springer, Tokyo
Print ISBN: 978-4-431-56533-8
Online ISBN: 978-4-431-56535-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics