Advertisement

User Generated Dialogue Systems: uDialogue

  • Keiichi TokudaEmail author
  • Akinobu Lee
  • Yoshihiko Nankaku
  • Keiichiro Oura
  • Kei Hashimoto
  • Daisuke Yamamoto
  • Ichi Takumi
  • Takahiro Uchiya
  • Shuhei Tsutsumi
  • Steve Renals
  • Junichi Yamagishi
Chapter

Abstract

This chapter introduces the idea of user-generated dialogue content and describes our experimental exploration aimed at clarifying the mechanism and conditions that makes it workable in practice. One of the attractive points of a speech interface is to provide a vivid sense of interactivity that cannot be achieved with a text interface alone. This study proposes a framework that spoken dialogue systems are separated into content that can be produced and modified by users, and the systems that drive the content, and seek to clarify (1) the requirements of systems that enable the creation of attractive spoken dialogue, and (2) the conditions for the active generation of attractive dialogue content by users, while attempting to establish a method for realizing them. Experiments for validating user dialogue content generation were performed by installing interactive digital signage with a speech interface in public spaces as a dialogue device, and implementing a content generation environment for users via the Internet. The proposed framework is expected to lead to a breakthrough in the spread of using speech technology.

Keywords

User-generated content Spoken dialogue system Speech recognition Speech synthesis 

References

  1. 1.
    HTS: HMM speech synthesis toolkit, http://www.hts.nitech.ac.jp/
  2. 2.
    Open JTalk: Japanese text-to-speech system, http://open-jtalk.sourceforge.net/
  3. 3.
    Julius: Open-source large vocabulary continuous speech recognition engine, http://julius.sourceforge.jp/
  4. 4.
    MMDAgent: Toolkit for building voice interaction systems, http://www.mmdagent.jp/
  5. 5.
    T. Funayachi, K. Oura, Y. Nankaku, A. Lee, K. Tokuda, A simple dialogue description based on finite state transducers for user-generated spoken dialog content, in Proceedings of ASJ 2013 Autumn Meeting, 2-P-28, pp. 223–224, 25–27 Sept 2013. (in Japanese)Google Scholar
  6. 6.
    K. Nakamura, K. Hashimoto, Y. Nankaku, K. Tokuda, Integration of spectral feature extraction and modeling for HMM-based speech synthesis. IEICE Trans. Inf. Syst. E97-D(6), 1438–1448 (2014)Google Scholar
  7. 7.
    S. Takaki, Y. Nankaku, K. Tokuda, Contextual partial additive structure for HMM-based speech synthesis, in 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver, Canada, pp. 7878–7882, 2013Google Scholar
  8. 8.
    R. Dall, M. Tomalin, M. Wester, W. Byrne, S. King, Investigating automatic & human filled pause insertion for speech synthesis, in Proceedings of Interspeech, 2014Google Scholar
  9. 9.
    S. R. Gangireddy, S. Renals, Y. Nankaku, A. Lee, Prosodically-enhanced recurrent neural network language models, in Proceedings of Interspeech 2015, Dresden, Sept 2015Google Scholar
  10. 10.
    K. Hashimoto, K. Oura, Y. Nankaku, K. Tokuda, The effect of neural networks in statistical parametric speech synthesis, in Proceedings of 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2015), Brisbane, Australia, pp. 4455–4459, 19–24 Apr 2015Google Scholar
  11. 11.
    S. Takaki, S. Kim, J. Yamagishi, J.J. Kim, Multiple feed-forward deep neural networks for statistical parametric speech synthesis, in Proceedings of Interspeech, vol. 2015, pp. 2242–2246, 2015Google Scholar
  12. 12.
    K. Hashimoto, K. Oura, Y. Nankaku, K. Tokuda, Trajectory training considering global variance for speech synthesis based on neural networks, in Proceedings of 2016 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2016), Shanghai, China, pp. 5600–5604, 20–25 Mar 2016Google Scholar
  13. 13.
    K. Sawada, K. Hashimoto, K. Oura, Y. Nankaku, and K. Tokuda, “Evaluation of text-to-speech system construction for unknown-pronunciation languages,” Technical Report of IEICE, vol. 115, no. 346, SP2015-80, pp. 93–98, 2–3 Dec 2015Google Scholar
  14. 14.
    S.R. Gangireddy, Q. Huang, S. Renals, F. McInnes, J. Yamagishi, in Topic Model Features in Neural Network Language Models, (UK Speech Meeting, 2013)Google Scholar
  15. 15.
  16. 16.
    D. Yamamoto, K. Oura, R. Nishimura, T. Uchiya, A. Lee, I. Takumi, Keiichi Tokuda, Voice interaction system with 3D-CG human agent for Stand-alone smartphones, in Proceedings of the 2nd International Conference on Human Agent Interaction (ACM digital library, 2014), pp. 320–330Google Scholar
  17. 17.
    K. Wakabayashi, D. Yamamoto, N. Takahashi, A voice dialog editor based on finite state transducer using composite state for tablet devices, computer and information science 2015. Stud. Comput. Intell. 614, 125–139 (2016)Google Scholar
  18. 18.
    R. Nishimura, D. Yamamoto, T. Uchiya, I. Takumi, Development of a dialogue scenario editor on a web browser for a spoken dialogue system, in Proceedings of the Second International Conference on Human-agent Interaction, pp. 129–132, 2014Google Scholar
  19. 19.
    Y. Matsushita, T. Uchiya, R. Nishimura, D. Yamamoto, I. Takumi, Crowdsourcing environment to create voice interaction scenario of spoken dialogue system, in Proceedings of the 18th International Conference on Network-Based Information Systems (NBiS-2015), pp. 500–504, 2015Google Scholar
  20. 20.
    Y. Matsushita, T. Uchiya, R. Nishimura, D. Yamamoto, I. Takumi, Experiment and evaluation of crowd sourcing model for creation of voice interaction scenario. Proc. IEEE GCCE 2015, 321–322 (2015)Google Scholar
  21. 21.
    T. Uchiya, R. Nakano, D. Yamamoto, R. Nishimura, I. Takumi, Extension with intelligent agents for the spoken dialogue system for smartphones. Proc. IEEE GCCE 2015, 298–299 (2015)Google Scholar
  22. 22.
    T. Uchiya, D. Yamamoto, M. Shibakawa, M. Yoshida, R. Nishimura, I. Takumi, Development of spoken dialogue service based on video call named “Mobile Meichan”. Proc. JAWS2012 (2012). (in Japanese)Google Scholar
  23. 23.
    T. Uchiya, M. Yoshida, D. Yamamoto, R. Nishimura, I. Takumi, Design and implementation of open-campus event system with voice interaction agent. Int. J. Mob. Multimed. 11(3, 4), 237–250 (2015)Google Scholar
  24. 24.
    R. Nishimura, K. Sugioka, D. Yamamoto, T. Uchiya, I. Takumi, A VoIP-based voice interaction system for a virtual telephone operator using video calls. Proc. IEEE GCCE 2014, 529–532 (2014)Google Scholar

Copyright information

© Springer Japan KK 2017

Authors and Affiliations

  • Keiichi Tokuda
    • 1
    Email author
  • Akinobu Lee
    • 1
  • Yoshihiko Nankaku
    • 1
  • Keiichiro Oura
    • 1
  • Kei Hashimoto
    • 1
  • Daisuke Yamamoto
    • 1
  • Ichi Takumi
    • 1
  • Takahiro Uchiya
    • 1
  • Shuhei Tsutsumi
    • 1
  • Steve Renals
    • 2
  • Junichi Yamagishi
    • 3
  1. 1.Nagoya Institute of TechnologyNagoyaJapan
  2. 2.University of EdinburghEdinburghUK
  3. 3.National Institute of InformaticsTokyoJapan

Personalised recommendations