Skip to main content

User Generated Dialogue Systems: uDialogue

  • Chapter
  • First Online:

Abstract

This chapter introduces the idea of user-generated dialogue content and describes our experimental exploration aimed at clarifying the mechanism and conditions that makes it workable in practice. One of the attractive points of a speech interface is to provide a vivid sense of interactivity that cannot be achieved with a text interface alone. This study proposes a framework that spoken dialogue systems are separated into content that can be produced and modified by users, and the systems that drive the content, and seek to clarify (1) the requirements of systems that enable the creation of attractive spoken dialogue, and (2) the conditions for the active generation of attractive dialogue content by users, while attempting to establish a method for realizing them. Experiments for validating user dialogue content generation were performed by installing interactive digital signage with a speech interface in public spaces as a dialogue device, and implementing a content generation environment for users via the Internet. The proposed framework is expected to lead to a breakthrough in the spread of using speech technology.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The number of downloads is the cumulative total from October 2011 to March 2016.

References

  1. HTS: HMM speech synthesis toolkit, http://www.hts.nitech.ac.jp/

  2. Open JTalk: Japanese text-to-speech system, http://open-jtalk.sourceforge.net/

  3. Julius: Open-source large vocabulary continuous speech recognition engine, http://julius.sourceforge.jp/

  4. MMDAgent: Toolkit for building voice interaction systems, http://www.mmdagent.jp/

  5. T. Funayachi, K. Oura, Y. Nankaku, A. Lee, K. Tokuda, A simple dialogue description based on finite state transducers for user-generated spoken dialog content, in Proceedings of ASJ 2013 Autumn Meeting, 2-P-28, pp. 223–224, 25–27 Sept 2013. (in Japanese)

    Google Scholar 

  6. K. Nakamura, K. Hashimoto, Y. Nankaku, K. Tokuda, Integration of spectral feature extraction and modeling for HMM-based speech synthesis. IEICE Trans. Inf. Syst. E97-D(6), 1438–1448 (2014)

    Google Scholar 

  7. S. Takaki, Y. Nankaku, K. Tokuda, Contextual partial additive structure for HMM-based speech synthesis, in 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver, Canada, pp. 7878–7882, 2013

    Google Scholar 

  8. R. Dall, M. Tomalin, M. Wester, W. Byrne, S. King, Investigating automatic & human filled pause insertion for speech synthesis, in Proceedings of Interspeech, 2014

    Google Scholar 

  9. S. R. Gangireddy, S. Renals, Y. Nankaku, A. Lee, Prosodically-enhanced recurrent neural network language models, in Proceedings of Interspeech 2015, Dresden, Sept 2015

    Google Scholar 

  10. K. Hashimoto, K. Oura, Y. Nankaku, K. Tokuda, The effect of neural networks in statistical parametric speech synthesis, in Proceedings of 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2015), Brisbane, Australia, pp. 4455–4459, 19–24 Apr 2015

    Google Scholar 

  11. S. Takaki, S. Kim, J. Yamagishi, J.J. Kim, Multiple feed-forward deep neural networks for statistical parametric speech synthesis, in Proceedings of Interspeech, vol. 2015, pp. 2242–2246, 2015

    Google Scholar 

  12. K. Hashimoto, K. Oura, Y. Nankaku, K. Tokuda, Trajectory training considering global variance for speech synthesis based on neural networks, in Proceedings of 2016 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2016), Shanghai, China, pp. 5600–5604, 20–25 Mar 2016

    Google Scholar 

  13. K. Sawada, K. Hashimoto, K. Oura, Y. Nankaku, and K. Tokuda, “Evaluation of text-to-speech system construction for unknown-pronunciation languages,” Technical Report of IEICE, vol. 115, no. 346, SP2015-80, pp. 93–98, 2–3 Dec 2015

    Google Scholar 

  14. S.R. Gangireddy, Q. Huang, S. Renals, F. McInnes, J. Yamagishi, in Topic Model Features in Neural Network Language Models, (UK Speech Meeting, 2013)

    Google Scholar 

  15. CSTR VCTK Corpus, http://www.udialogue.org/ja/download-ja.html

  16. D. Yamamoto, K. Oura, R. Nishimura, T. Uchiya, A. Lee, I. Takumi, Keiichi Tokuda, Voice interaction system with 3D-CG human agent for Stand-alone smartphones, in Proceedings of the 2nd International Conference on Human Agent Interaction (ACM digital library, 2014), pp. 320–330

    Google Scholar 

  17. K. Wakabayashi, D. Yamamoto, N. Takahashi, A voice dialog editor based on finite state transducer using composite state for tablet devices, computer and information science 2015. Stud. Comput. Intell. 614, 125–139 (2016)

    Google Scholar 

  18. R. Nishimura, D. Yamamoto, T. Uchiya, I. Takumi, Development of a dialogue scenario editor on a web browser for a spoken dialogue system, in Proceedings of the Second International Conference on Human-agent Interaction, pp. 129–132, 2014

    Google Scholar 

  19. Y. Matsushita, T. Uchiya, R. Nishimura, D. Yamamoto, I. Takumi, Crowdsourcing environment to create voice interaction scenario of spoken dialogue system, in Proceedings of the 18th International Conference on Network-Based Information Systems (NBiS-2015), pp. 500–504, 2015

    Google Scholar 

  20. Y. Matsushita, T. Uchiya, R. Nishimura, D. Yamamoto, I. Takumi, Experiment and evaluation of crowd sourcing model for creation of voice interaction scenario. Proc. IEEE GCCE 2015, 321–322 (2015)

    Google Scholar 

  21. T. Uchiya, R. Nakano, D. Yamamoto, R. Nishimura, I. Takumi, Extension with intelligent agents for the spoken dialogue system for smartphones. Proc. IEEE GCCE 2015, 298–299 (2015)

    Google Scholar 

  22. T. Uchiya, D. Yamamoto, M. Shibakawa, M. Yoshida, R. Nishimura, I. Takumi, Development of spoken dialogue service based on video call named “Mobile Meichan”. Proc. JAWS2012 (2012). (in Japanese)

    Google Scholar 

  23. T. Uchiya, M. Yoshida, D. Yamamoto, R. Nishimura, I. Takumi, Design and implementation of open-campus event system with voice interaction agent. Int. J. Mob. Multimed. 11(3, 4), 237–250 (2015)

    Google Scholar 

  24. R. Nishimura, K. Sugioka, D. Yamamoto, T. Uchiya, I. Takumi, A VoIP-based voice interaction system for a virtual telephone operator using video calls. Proc. IEEE GCCE 2014, 529–532 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Keiichi Tokuda .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Japan KK

About this chapter

Cite this chapter

Tokuda, K. et al. (2017). User Generated Dialogue Systems: uDialogue. In: Nishida, T. (eds) Human-Harmonized Information Technology, Volume 2. Springer, Tokyo. https://doi.org/10.1007/978-4-431-56535-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-4-431-56535-2_3

  • Published:

  • Publisher Name: Springer, Tokyo

  • Print ISBN: 978-4-431-56533-8

  • Online ISBN: 978-4-431-56535-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics